Hey @jtke! Such a good question! I have done a version of this where I use fake data while developing models: usually my fake data consists of all my known edge cases so that I can check that my SQL works. Unfortunately, I usually only do that during development, and switch out my references to the fake data before merging the code.
However, I have thought about extending this to be more long-lived in my project! Here’s what I would do:
1. Add seed files for your expected input and output to your dbt project
Name them sensibly, like input_subscription_events
and expected_subscription_periods
. Normally I like to design these seed files to have lots of tricky cases so I can be sure that my SQL is working as intended. Don’t forget to run dbt seed
.
If you want to run things with “real”, but static, data, you could take the time to recreate the tables in a separate schema – check out this post on some approaches there!
2. In your models, ref
the static data
But wrap it up in some conditional logic! (Yup, you can ref
seeds if you decide to go down that route).
-- subscription_periods.sql
with subscription_events as (
{% if target.name == 'ci' %}
select * from {{ ref('input_subscription_events') }}
{% else %}
select * from {{ ref('stg_subscription_events') }}
{% endif %}
)
...
3. Add an “equality” test to check that the expected input matches the expected output
We use this test to check that two relations match exactly. You can see us use this testing pattern all throughout dbt_utils
in its own integration tests (e.g. here)
# schema.yml
version: 2
models:
- name: subscription_periods
tests:
- dbt_utils.equality:
compare_model: ref('expected_subscription_periods')
4. Run your test with --target ci
The magic incanation here is:
dbt test --target ci
It should all pass if your transformations work!
But then when you run it in any other environment, that test will fail, so…
5. Parameterize the equality
test to accept an environment
argument.
You’ll have to add your own version of the equality
test to your project, and add an extra argument. You want to get to something like this:
# schema.yml
version: 2
models:
- name: subscription_periods
tests:
- jaffle_shop.equality: # replace with your project name
compare_model: ref('expected_subscription_periods')
env: ci
I haven’t done this last step for this particular macro, but I wrote about the structure of making a test environment aware in another Discourse article.
Now when you execute dbt test --target ci
, you’ll be comparing your static data. In any other environment, you’ll be running the real-deal, and tests shouldn’t fail in either.
Let me know how you go!