@joshtaylor thanks for sharing! This is quite clever, at which point in the development lifecycle do you run this test? At development-time, or at deploy-time?
I like the name test_where_exists
, I wonder if semantic macros in a dbt package could be added to make improve the baseline of development-time testing. An idea I have for this is exact_match
like so:
-- BigQuery Standard SQL:
{% macro exact_match(test_dataset_query, query_under_test) %}
(
({{ test_dataset_query }})
except distinct
({{ query_under_test }})
)
union all
(
({{ query_under_test }})
except distinct
({{ test_dataset_query }})
)
{% endmacro %}
Then two more macros that can be used similarly:
is_contained_within(test_dataset_query, query_under_test)
does_not_contain(test_dataset_query, query_under_test)
Applying this in a data test (borrowing from jaffle_shop
):
{% set test_dataset_query %}
select 'mperez0@chronoengine.com' as email, 33 as customer_lifetime_value union all
select 'smccoy1@reddit.com' as email, 23 as customer_lifetime_value union all
select 'kpayne2@cargocollective.com' as email, 65 as customer_lifetime_value
{% endset %}
{% set query_under_test %}
select
email,
customer_lifetime_value
from {{ ref('dim_customers') }} as dim_customers
left join {{ ref('raw_customers') }} as raw_customers
on raw_customers.id = dim_customers.customer_id
{% endset %}
{{ exact_match(test_dataset_query, query_under_test) }}
Now imagine being able to define which ref
s are mocked and how in the above data test file, I think that would be a great developer experience for a self-contained “unit test” in a single file. @jerco curious to get your thoughts on the feasibility of that idea.