Select rows based on test pass/fail status

scottgigante · June 27, 2023, 12:50am

The problem I’m having

Hi all! Looking for some advice on dbt best practices. I have a db which is the result of a manual data input process for which I would like to use dbt to split into high-quality data (passing tests) and data for review (failing tests).

The context of why I’m trying to do this

When dbt test fails, it’s likely not because anything is wrong with my data – it’s just that manual curation is imperfect / incomplete. I’d like to be able to push through the rows that do pass to downstream models while holding back those that don’t.

What I’ve already tried

I’ve tried using store-failures but this doesn’t necessarily give me a clear path to finding which rows are the source of the failures (without writing lots of manual code to reverse engineer the tests). I’ve also tried using an incremental model and selecting into the for-review model based on what is not in the downstream model, but as soon as one row is failing, this approach holds back everything.

nljl18 · June 27, 2023, 1:47am

Hello! We had a similar case with my current client. The way we ended up working around this was to build our own ‘Data Quality’ model that sat between the intermediate model and our final mart. The DQ model was then just a collection of CTEs, where each one was its own test and the select included the unique key, which we then unioned together at the end.

In our final mart, we then basically did select * from int_model where id not in (select distinct id from dq_model).

This has the downside of not being able to use any of the dbt tests, but we created our own macros for common tests and based most of it on dbt tests!

We also selected out a test name/description as part of the DQ model, which meant we could give good summaries to our client around where the biggest quality issues were.

A manual approach but it’s working for us!

Topic		Replies	Views
Identify and capture records of failed cases in dbt tests Help testing	0	427	September 19, 2024
Data quality + transformation in DBT Help	1	408	April 17, 2024
How to handle test failures Archive	5	18330	February 24, 2022
using dbt to manage data quality issues Help testing , best-practice , dbt-core	0	653	January 18, 2024
How to get started with data testing In-Depth Discussions testing	4	8071	July 16, 2019

Select rows based on test pass/fail status

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Related topics