Joining snapshot tables (time range based joins)

gnilrets · October 22, 2021, 11:39pm

dbt snapshots and time-range joins

dbt Snapshots are the foundation of our data modeling layer. They are used to preserve the history of how data has changed over time. Capturing this history is invaluable for debugging and critical for certain business functions where we either need to track how an attribute changed that wasn’t captured in the source system, or when we need a particular report to not change as subsequent data updates happen (e.g., billing). However, snapshot data can be more difficult to work with, particularly when it comes to joining two different tables that have been subject to snapshotting.

To demonstrate the issue, we’ll consider a somewhat uninteresting case of having to join an “engagement” snapshot with a “partner” snapshot, where the names of the records were changed at different times, and we want to capture those name changes.

Given an engagements snapshot that looks like (note that for practical purposes, the NULL has been replaced by actual date far in the future 2999-12-31 23:59:59 – aka, the Y3K bug):

engagement_sfid	partner_sfid	engagement_name	dbt_valid_from	dbt_valid_to
eng1	part1	Engagement Old	2021-10-08 00:01:01	2021-10-10 00:00:01
eng1	part1	Engagement New	2021-10-10 00:00:01	2999-12-31 23:59:59

and a partners snapshot that looks like:

partner_id	partner_name	dbt_valid_from	dbt_valid_to
part1	Partner Old	2021-10-08 00:00:01	2021-10-12 00:00:01
part1	Partner New	2021-10-12 00:00:01	2999-12-31 23:59:59

How do we join the two tables together to capture the combined record effective dates? Simply joining engagements.partner_sfid to partners.partner_sfid would give us the cartesian product of all the records, including a record where “Engagement Old” is paired with “Partner New”, which never actually happened because “Engagement Old” was not effective when “Partner Old” was changed to “Partner New”. What we want is a joined table that looks like:

engagement_sfid	partner_sfid	engagement_name	partner_name	valid_from	valid_to
eng1	part1	Engagement Old	Partner Old	2021-10-08 00:01:01	2021-10-10 00:00:01
eng1	part1	Engagement New	Partner Old	2021-10-10 00:00:01	2021-10-12 00:00:01
eng1	part1	Engagement New	Partner New	2021-10-12 00:00:01	2999-12-31 23:59:59

Fortunately, most of this problem has already been solved, and I’ve implemented it in a dbt macro. To use it, you need to do some light data preparation of the tables involved as CTEs and then use the macro to specify the CTEs involved and how they join together:

WITH engagements AS (
  SELECT
    engagement_sfid,
    engagement_name,
    partner_sfid,
    engagement_dotdotdot,
    dbt_scd_id,
    dbt_valid_from,
    dbt_valid_to
  FROM
    {{ ref('engagements') }}
),

partners AS (
  SELECT
    partner_sfid,
    partner_name,
    partner_dotdotdot,
    dbt_scd_id,
    dbt_valid_from,
    dbt_valid_to
  FROM
    {{ ref('partners') }}
),

{{
  trange_join(
    left_model='engagements',
    left_fields=['engagement_sfid', 'engagement_name', 'partner_sfid', 'engagement_dotdotdot'],
    left_primary_key='engagement_sfid',
    right_models={
      'partners': {
        'fields': ['partner_sfid', 'partner_name', 'partner_dotdotdot'],
        'left_on': 'partner_sfid',
        'right_on': 'partner_sfid',
      }
    }
  )
}},

SELECT * FROM trange_final

ron.chipman · October 23, 2021, 4:43am

Interesting approach @gnilrets . Generally, in dwh modeling, the SCDs are at the end of the process, and by rule, you do not join these together directly – but through the interesting facts that you want to measure. So, you might have a simple fact of ‘number_of_engagements’ in which both partners and engagements are connected to in a table like “partner_sfid, engagement_sfid, nb_engagements”. in this manner, the right partner and the right engagement record is linked to the fact (in this case 1 – “factless fact table”), and as a result, together.

In dbt, they are encouraging us to put snapshots very early on, and I’m trying to leverage these years of modeling experience to wrap my head around this approach.

cc: @claire (who has posted tons of great modeling articles).

peterh · October 23, 2021, 10:13pm

Thanks for posting this. We use the same pattern on our team though we haven’t yet made it a macro - definitely will take a look at yours. We call the pattern a change date spine.

Topic		Replies	Views
Tackling the complexity of joining snapshots In-Depth Discussions devblog	2	1802	February 20, 2024
How do I turn 2 years of source data into a snapshot table Help snapshots , snowflake , dbt-cloud	0	67	February 20, 2025
Building snapshots on an append only table. Help snapshots	0	187	August 20, 2024
Periodic Snapshot Fact Tables Help	4	154	March 7, 2025
Initialise snapshot with data from a dimension table Help snowflake	2	1035	September 1, 2023

Joining snapshot tables (time range based joins)

dbt snapshots and time-range joins

Related topics