Background
Imagine that you have expected outputs for a historic metric (total revenue) as described below:
| Year | Total Revenue |
|---|---|
| 2019 | 1 million |
| 2020 | 1.5 million |
| 2021 | 2 million |
How can you test that these historical metrics do not change in dbt?
This is an alternative solution to the snapshot-based one proposed here.
Step 1
Let’s say that you have a fct_orders table, which has one row for each order:
| order_id | order_date | amount |
|---|---|---|
| 1 | 2019-01-05 | 10 |
| 2 | 2019-02-06 | 50 |
| 3 | 2020-02-07 | 8 |
| … | … | … |
First, you should create a model that sums the amount for each year excluding the current one (for simplicity, we’re assuming you have no costs). Let’s call this fct_revenue_summary:
select
year(order_date) as year,
sum(amount) as total_revenue
from {{ ref('fct_orders') }}
where year <> year(current_timestamp())
group by 1
Step 2
Next, create an incremental model on top of fct_revenue_summary which captures the historical view of revenue outputs. Let’s call this fct_revenue_summary_history:
{{
config(
materialized='incremental',
unique_key=['year', 'total_revenue']
)
}}
select
year,
total_revenue
from {{ ref('fct_revenue_summary') }}
{% if is_incremental() %}
where true
{% endif %}
Step 3
Finally, create a test on fct_revenue_summary_history to check that each year has a single source of truth for total_revenue:
version: 2
models:
- name: fct_revenue_summary_history
columns:
- name: year
tests:
- unique
When you run a dbt build, you will get an error if you ever output a new total_revenue value for a historic year that differs from the original. For example, if your original fct_revenue_summary_history looks like this:
| year | total_revenue |
|---|---|
| 2019 | 1 million |
| 2020 | 1.5 million |
| 2021 | 2 million |
But then, you introduce a breaking change such that dbt now calculates the total_revenue for 2019 as 0.8 million, fct_revenue_summary_history will now look like this:
| year | total_revenue |
|---|---|
| 2019 | 1 million |
| 2019 | 0.8 million |
| 2020 | 1.5 million |
| 2021 | 2 million |
And the uniqueness test on the year column will fail.
This allows you to detect regressions in historic metrics!