Background
Imagine that you have expected outputs for a historic metric (total revenue) as described below:
Year | Total Revenue |
---|---|
2019 | 1 million |
2020 | 1.5 million |
2021 | 2 million |
How can you test that these historical metrics do not change in dbt?
This is an alternative solution to the snapshot-based one proposed here.
Step 1
Let’s say that you have a fct_orders
table, which has one row for each order:
order_id | order_date | amount |
---|---|---|
1 | 2019-01-05 | 10 |
2 | 2019-02-06 | 50 |
3 | 2020-02-07 | 8 |
… | … | … |
First, you should create a model that sums the amount for each year excluding the current one (for simplicity, we’re assuming you have no costs). Let’s call this fct_revenue_summary
:
select
year(order_date) as year,
sum(amount) as total_revenue
from {{ ref('fct_orders') }}
where year <> year(current_timestamp())
group by 1
Step 2
Next, create an incremental model on top of fct_revenue_summary
which captures the historical view of revenue outputs. Let’s call this fct_revenue_summary_history
:
{{
config(
materialized='incremental',
unique_key=['year', 'total_revenue']
)
}}
select
year,
total_revenue
from {{ ref('fct_revenue_summary') }}
{% if is_incremental() %}
where true
{% endif %}
Step 3
Finally, create a test on fct_revenue_summary_history
to check that each year has a single source of truth for total_revenue
:
version: 2
models:
- name: fct_revenue_summary_history
columns:
- name: year
tests:
- unique
When you run a dbt build
, you will get an error if you ever output a new total_revenue
value for a historic year that differs from the original. For example, if your original fct_revenue_summary_history
looks like this:
year | total_revenue |
---|---|
2019 | 1 million |
2020 | 1.5 million |
2021 | 2 million |
But then, you introduce a breaking change such that dbt now calculates the total_revenue
for 2019 as 0.8 million, fct_revenue_summary_history
will now look like this:
year | total_revenue |
---|---|
2019 | 1 million |
2019 | 0.8 million |
2020 | 1.5 million |
2021 | 2 million |
And the uniqueness test on the year column will fail.
This allows you to detect regressions in historic metrics!