How to use incremental models to detect regressions in historic metrics

grace.goheen · January 27, 2023, 5:37pm

Background

Imagine that you have expected outputs for a historic metric (total revenue) as described below:

Year	Total Revenue
2019	1 million
2020	1.5 million
2021	2 million

How can you test that these historical metrics do not change in dbt?

This is an alternative solution to the snapshot-based one proposed here.

Step 1

Let’s say that you have a fct_orders table, which has one row for each order:

order_id	order_date	amount
1	2019-01-05	10
2	2019-02-06	50
3	2020-02-07	8
…	…	…

First, you should create a model that sums the amount for each year excluding the current one (for simplicity, we’re assuming you have no costs). Let’s call this fct_revenue_summary:

select
        year(order_date) as year,
        sum(amount) as total_revenue
from {{ ref('fct_orders') }}
where year <> year(current_timestamp())
group by 1

Step 2

Next, create an incremental model on top of fct_revenue_summary which captures the historical view of revenue outputs. Let’s call this fct_revenue_summary_history:

{{
    config(
        materialized='incremental',
        unique_key=['year', 'total_revenue']
    )
}}

select
        year,
        total_revenue
from {{ ref('fct_revenue_summary') }}

{% if is_incremental() %}
where true
{% endif %}

Step 3

Finally, create a test on fct_revenue_summary_history to check that each year has a single source of truth for total_revenue:

version: 2

models:
  - name: fct_revenue_summary_history
    columns:
      - name: year
        tests:
          - unique

When you run a dbt build, you will get an error if you ever output a new total_revenue value for a historic year that differs from the original. For example, if your original fct_revenue_summary_history looks like this:

year	total_revenue
2019	1 million
2020	1.5 million
2021	2 million

But then, you introduce a breaking change such that dbt now calculates the total_revenue for 2019 as 0.8 million, fct_revenue_summary_history will now look like this:

year	total_revenue
2019	1 million
2019	0.8 million
2020	1.5 million
2021	2 million

And the uniqueness test on the year column will fail.

This allows you to detect regressions in historic metrics!

marcorossi · February 16, 2023, 1:46pm

This is really interesting!
Thanks for sharing

Topic		Replies	Views
Build snapshot-based tests to detect regressions in historic data Show and Tell testing , data-quality , snapshots	3	10363	April 27, 2022
Strategies for change data capture in dbt In-Depth Discussions devblog	2	3199	October 5, 2023
Testing incremental models In-Depth Discussions testing , incremental	3	9417	April 20, 2021
Incremental model backfill missing data Help redshift , dbt-cloud , dbt-core	3	352	October 24, 2024
Is it possible that models using the dbt_metrics package (metrics.calculate macro) can become incremental? Help incremental	0	715	January 31, 2023

How to use incremental models to detect regressions in historic metrics

Background

Step 1

Step 2

Step 3

Related topics