How to monitor your dbt tests daily

Since the release of dbt-core v0.20.0, dbt allows you store test results by providing the --store-failures flag at your dbt test command. Providing this flag, dbt will automatically store corrupt validations in a table per test. Everytime dbt test --store-failures is run, it overwrites the previous results.

I wanted to share our approach to register the number of failing tests daily. It uses a macro and dbt snapshot which allows us to monitor non-critical tests such as data quality over time.

The macro

First, we created a macro to retrieve all the test results. We named this macro generate_tests_stats() and stored it under macros/utils/generate_tests_stats.sql in our dbt project root.

{%- macro generate_tests_stats(schema) -%} 

    {# Check all tables in the provided tests schema #}
    {%- call statement('tests_tables_query', fetch_result=True) %}

        select
            t.name as table_name,
            t.modify_date as modified_date
        from sys.tables t
        where schema_name(t.schema_id) = '{{ schema }}'
        order by table_name

    {%- endcall -%}

    {# Query and save result in table_list var #}
    {%- set table_list = load_result('tests_tables_query')['data'] -%}

    {# Check if there are any results #}
    {%- if table_list -%}

        {# Define count(*) query and execute for all tables in the schema #}
        {%- for table_name in table_list -%}

            {%- call statement('tests_table_count', fetch_result=True) %}

                select
                    count(*)
                from {{ schema }}.{{ table_name[0] }}

            {%- endcall -%}

            {%- set count_result = load_result('tests_table_count')['data'] -%}

            SELECT 
                '{{ table_name[0] }}' as test_name,
                CAST('{{ table_name[1] }}' AS DATETIME2(7)) as snapshot_date,
                {{ count_result[0][0]|round|int }} as failure_count 


            {% if not loop.last %}

                UNION ALL


            {% endif %}

        {%- endfor -%}


    {%- else -%}

        {{ return([]) }}

    {%- endif -%}


{% endmacro %}

:warning: This compiles SQL for Synapse. If you use another db, you might need to make some changes.

The compiled SQL of this macro should be looking like this:

SELECT 
  your_test_1 as test_name,
  CAST('2022-07-07 12:34:23' AS DATETIME2(7)) as snapshot_date,
  0 as failure_count
  
UNION ALL

  your_test_2 as test_name,
  CAST('2022-07-07 12:37:23' AS DATETIME2(7)) as snapshot_date,
  0 as failure_count
  
UNION ALL

  your_test_3 as test_name,
  CAST('2022-07-07 12:38:23' AS DATETIME2(7)) as snapshot_date,
  0 as failure_count

Snapshot

We then created a snapshot snapshots/dbt/tests_snapshot.sql

{% snapshot tests_snapshot %}

{{
    config(
        tags=['tests'],
        strategy='timestamp',
        unique_key='test_name',
        updated_at='snapshot_date'
    )
}}

-- Set schema, since tests use custom schema on dev
{% if  target.name in ["dev", "ci"] %}
    {%- set test_schema = target.schema ~ '_tests' -%}
{% else %}
     {%- set test_schema = 'tests' -%}
{% endif %}

{{ generate_tests_stats(test_schema) }}

{% endsnapshot %}

:warning: We defined our --store-failures tests in a schema called tests, if you use a different schema you need to change this in the snapshot.

And that’s it! I think it’s a pretty simple but powerful setup to easily keep track of tests results daily.