DBT Test : How to avoid missing values in source data ?

anaec · February 5, 2024, 10:46am

The problem I’m having

I am trying to implement a test that can check if there is missing values in the source table.

The context of why I’m trying to do this

I am currently implementing new test to my dbt models that going to feed my dashboards. My users need kpis in order to get an overview of the company performance.
Last week, I got a missing value issue. All the kpis from the dashboards did not break but the kpis were wrong (because there was values in the model). Hopefully, I manage to see the error, but I want to avoid this missing value error without having to check the data everyday.

What I’ve already tried

I tried to build custom singular test that checks if all the ids from my stg table are in my source. Unfortunately, dbt run the test after running the stg model.
Here is the test I was trying to do :

select id from {{ ref('stg_table') }}
where id not in (select id from {{ source('source_table') }})

Long story short, I want to check if all my id from the stg table are in my source table before running the stg model. So, if my ids are not in the source table, I know that I have missing values in the source table.

How can I check that there is no missing values in my sources tables ?

Thank you for your time !

brunoszdl · February 5, 2024, 1:22pm

HI @anaec, I see the following options

1 - You can simply run dbt test before dbt run, so you test before running

2 - You can create a snapshot for the source table with the invalidate_hard_deletes config enabled (invalidate_hard_deletes | dbt Developer Hub). Then your stg model can reference the snapshot

Does any of these options make sense for your problem?

a_slack_user · February 6, 2024, 5:55am

I think you should consider the audit_helper package. It has a compare_queries test that can show you if you are missing records either on query A or query B. You can set a ‘error’ on the test so it will not continue when the test fails. You can even store the missing values in a table if needed.

https://hub.getdbt.com/dbt-labs/audit_helper/latest/

example:

_{Note: @Maikel Korff originally posted this reply in Slack. It might not have transferred perfectly.}

anaec · February 9, 2024, 2:15pm

Thank you so much for your different options !
I used the simple trick of running the dbt test before dbt run for my specific generic test (using tags). It was easy to make an it works well !
And I will probably try to implement audit helper later in my dbt project because it seems to be a great option.

Have a great day !

brunoszdl · February 9, 2024, 6:23pm

Awesome, @anaec ! Could you mark the answer as solution? Thanks!!

shivaramp · February 11, 2024, 10:01pm

I think if you check all those id’s before data load trigger using run_query() and create if condition to check all id present. If so then start data load trigger else raise error

system · February 18, 2024, 10:02pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How check source table presence and column types and presence checks with one setting? Help	0	1567	March 11, 2024
Test table existence Archive	1	5781	January 20, 2021
singular test - question about ref and source . Help testing , dbt-core	0	44	February 9, 2025
abort dbt job if conditions do not exist Help	10	3595	July 19, 2023
Testing data on existing database before creating new transformed table Help jinja , testing , best-practice , dbt-core	2	253	June 20, 2024

DBT Test : How to avoid missing values in source data ?

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Related topics