How to use incremental models to detect regressions in historic metrics

Background

Imagine that you have expected outputs for a historic metric (total revenue) as described below:

Year Total Revenue
2019 1 million
2020 1.5 million
2021 2 million

How can you test that these historical metrics do not change in dbt?

This is an alternative solution to the snapshot-based one proposed here.

Step 1

Let’s say that you have a fct_orders table, which has one row for each order:

order_id order_date amount
1 2019-01-05 10
2 2019-02-06 50
3 2020-02-07 8

First, you should create a model that sums the amount for each year excluding the current one (for simplicity, we’re assuming you have no costs). Let’s call this fct_revenue_summary:

select
        year(order_date) as year,
        sum(amount) as total_revenue
from {{ ref('fct_orders') }}
where year <> year(current_timestamp())
group by 1

Step 2

Next, create an incremental model on top of fct_revenue_summary which captures the historical view of revenue outputs. Let’s call this fct_revenue_summary_history:

{{
    config(
        materialized='incremental',
        unique_key=['year', 'total_revenue']
    )
}}

select
        year,
        total_revenue
from {{ ref('fct_revenue_summary') }}

{% if is_incremental() %}
where true
{% endif %}

Step 3

Finally, create a test on fct_revenue_summary_history to check that each year has a single source of truth for total_revenue:

version: 2

models:
  - name: fct_revenue_summary_history
    columns:
      - name: year
        tests:
          - unique

When you run a dbt build, you will get an error if you ever output a new total_revenue value for a historic year that differs from the original. For example, if your original fct_revenue_summary_history looks like this:

year total_revenue
2019 1 million
2020 1.5 million
2021 2 million

But then, you introduce a breaking change such that dbt now calculates the total_revenue for 2019 as 0.8 million, fct_revenue_summary_history will now look like this:

year total_revenue
2019 1 million
2019 0.8 million
2020 1.5 million
2021 2 million

And the uniqueness test on the year column will fail.

This allows you to detect regressions in historic metrics!

2 Likes

This is really interesting!
Thanks for sharing