Testing incremental models

I love how dbt allows me to test schema and data :heart:, but there’s one thing I haven’t figured out - it’s testing of incremental models.

My typical CI test run looks like this

dbt deps --profiles-dir $DBT_PROFILE_DIR
dbt seed --profiles-dir $DBT_PROFILE_DIR
dbt run  --profiles-dir $DBT_PROFILE_DIR
dbt test --profiles-dir $DBT_PROFILE_DIR

That works well for the --full-refresh case, which is basically what CI does, since it always run in a clean environment and DB. But the --full-refresh case is rare in production and what runs every day is processing incremental changes, which is left untested.

Not sure if it’s relevant, but I use static data in my seeds to cover all edge cases. So the increment isn’t always “today”, but rather some static date in the past.

What would be a good approach to correctly test full + incremental transformations?

Cheers!

One thing that I’ve found makes this testing easier is to tag each row to identify whether it was built by your full-refresh or by a subsequent incremental. You can do that with the is_incremental() function.

To make use of that, you can just add a column to each incremental table: “build_source”, like so:

{% if is_incremental() %}
‘incremental’
{% else %}
‘full’
{% endif %} as “build_source”,

When your model is built with --full-refresh (full build). and then without it (incrementally), the rows will either have ‘incremental’ or ‘full’ as the value of column “build_source”.

Once the column is populated, you can make use of that in your asserts and other test queries to see what the incremental did, like:

select * from inventory
where build_source = 'incremental'
order by inventory.item_id, updated_at

… and of course you can write more sophisticated queries to test different conditions explicitly.