Since the release of dbt-core
v0.20.0, dbt allows you store test results by providing the --store-failures
flag at your dbt test
command. Providing this flag, dbt will automatically store corrupt validations in a table per test. Everytime dbt test --store-failures
is run, it overwrites the previous results.
I wanted to share our approach to register the number of failing tests daily. It uses a macro and dbt snapshot which allows us to monitor non-critical tests such as data quality over time.
The macro
First, we created a macro to retrieve all the test results. We named this macro generate_tests_stats()
and stored it under macros/utils/generate_tests_stats.sql
in our dbt project root.
{%- macro generate_tests_stats(schema) -%}
{# Check all tables in the provided tests schema #}
{%- call statement('tests_tables_query', fetch_result=True) %}
select
t.name as table_name,
t.modify_date as modified_date
from sys.tables t
where schema_name(t.schema_id) = '{{ schema }}'
order by table_name
{%- endcall -%}
{# Query and save result in table_list var #}
{%- set table_list = load_result('tests_tables_query')['data'] -%}
{# Check if there are any results #}
{%- if table_list -%}
{# Define count(*) query and execute for all tables in the schema #}
{%- for table_name in table_list -%}
{%- call statement('tests_table_count', fetch_result=True) %}
select
count(*)
from {{ schema }}.{{ table_name[0] }}
{%- endcall -%}
{%- set count_result = load_result('tests_table_count')['data'] -%}
SELECT
'{{ table_name[0] }}' as test_name,
CAST('{{ table_name[1] }}' AS DATETIME2(7)) as snapshot_date,
{{ count_result[0][0]|round|int }} as failure_count
{% if not loop.last %}
UNION ALL
{% endif %}
{%- endfor -%}
{%- else -%}
{{ return([]) }}
{%- endif -%}
{% endmacro %}
This compiles SQL for Synapse. If you use another db, you might need to make some changes.
The compiled SQL of this macro should be looking like this:
SELECT
your_test_1 as test_name,
CAST('2022-07-07 12:34:23' AS DATETIME2(7)) as snapshot_date,
0 as failure_count
UNION ALL
your_test_2 as test_name,
CAST('2022-07-07 12:37:23' AS DATETIME2(7)) as snapshot_date,
0 as failure_count
UNION ALL
your_test_3 as test_name,
CAST('2022-07-07 12:38:23' AS DATETIME2(7)) as snapshot_date,
0 as failure_count
Snapshot
We then created a snapshot snapshots/dbt/tests_snapshot.sql
{% snapshot tests_snapshot %}
{{
config(
tags=['tests'],
strategy='timestamp',
unique_key='test_name',
updated_at='snapshot_date'
)
}}
-- Set schema, since tests use custom schema on dev
{% if target.name in ["dev", "ci"] %}
{%- set test_schema = target.schema ~ '_tests' -%}
{% else %}
{%- set test_schema = 'tests' -%}
{% endif %}
{{ generate_tests_stats(test_schema) }}
{% endsnapshot %}
We defined our --store-failures
tests in a schema called tests
, if you use a different schema you need to change this in the snapshot.
And that’s it! I think it’s a pretty simple but powerful setup to easily keep track of tests results daily.