State of testing in dbt

@joshtaylor thanks for sharing! This is quite clever, at which point in the development lifecycle do you run this test? At development-time, or at deploy-time?

I like the name test_where_exists, I wonder if semantic macros in a dbt package could be added to make improve the baseline of development-time testing. An idea I have for this is exact_match like so:

-- BigQuery Standard SQL:

{% macro exact_match(test_dataset_query, query_under_test) %}
(
    ({{ test_dataset_query }})
    except distinct
    ({{ query_under_test }})
)
union all 
(
    ({{ query_under_test }})
    except distinct
    ({{ test_dataset_query }})
)
{% endmacro %}

Then two more macros that can be used similarly:

  • is_contained_within(test_dataset_query, query_under_test)
  • does_not_contain(test_dataset_query, query_under_test)

Applying this in a data test (borrowing from jaffle_shop):

{% set test_dataset_query %}
select 'mperez0@chronoengine.com' as email, 33 as customer_lifetime_value union all
select 'smccoy1@reddit.com' as email, 23 as customer_lifetime_value union all
select 'kpayne2@cargocollective.com' as email, 65 as customer_lifetime_value
{% endset %}

{% set query_under_test %}
select
    email,
    customer_lifetime_value
from {{ ref('dim_customers') }} as dim_customers
left join {{ ref('raw_customers') }} as raw_customers
    on raw_customers.id = dim_customers.customer_id
{% endset %}

{{ exact_match(test_dataset_query, query_under_test) }}

Now imagine being able to define which refs are mocked and how in the above data test file, I think that would be a great developer experience for a self-contained “unit test” in a single file. @jerco curious to get your thoughts on the feasibility of that idea.

1 Like