The problem I’m having
Hi all! Looking for some advice on dbt best practices. I have a db which is the result of a manual data input process for which I would like to use dbt to split into high-quality data (passing tests) and data for review (failing tests).
The context of why I’m trying to do this
When dbt test
fails, it’s likely not because anything is wrong with my data – it’s just that manual curation is imperfect / incomplete. I’d like to be able to push through the rows that do pass to downstream models while holding back those that don’t.
What I’ve already tried
I’ve tried using store-failures
but this doesn’t necessarily give me a clear path to finding which rows are the source of the failures (without writing lots of manual code to reverse engineer the tests). I’ve also tried using an incremental model and selecting into the for-review model based on what is not in the downstream model, but as soon as one row is failing, this approach holds back everything.