Joining snapshot tables (time range based joins)

dbt snapshots and time-range joins

dbt Snapshots are the foundation of our data modeling layer. They are used to preserve the history of how data has changed over time. Capturing this history is invaluable for debugging and critical for certain business functions where we either need to track how an attribute changed that wasn’t captured in the source system, or when we need a particular report to not change as subsequent data updates happen (e.g., billing). However, snapshot data can be more difficult to work with, particularly when it comes to joining two different tables that have been subject to snapshotting.

To demonstrate the issue, we’ll consider a somewhat uninteresting case of having to join an “engagement” snapshot with a “partner” snapshot, where the names of the records were changed at different times, and we want to capture those name changes.

Given an engagements snapshot that looks like (note that for practical purposes, the NULL has been replaced by actual date far in the future 2999-12-31 23:59:59 – aka, the Y3K bug):

engagement_sfid partner_sfid engagement_name dbt_valid_from dbt_valid_to
eng1 part1 Engagement Old 2021-10-08 00:01:01 2021-10-10 00:00:01
eng1 part1 Engagement New 2021-10-10 00:00:01 2999-12-31 23:59:59

and a partners snapshot that looks like:

partner_id partner_name dbt_valid_from dbt_valid_to
part1 Partner Old 2021-10-08 00:00:01 2021-10-12 00:00:01
part1 Partner New 2021-10-12 00:00:01 2999-12-31 23:59:59

How do we join the two tables together to capture the combined record effective dates? Simply joining engagements.partner_sfid to partners.partner_sfid would give us the cartesian product of all the records, including a record where “Engagement Old” is paired with “Partner New”, which never actually happened because “Engagement Old” was not effective when “Partner Old” was changed to “Partner New”. What we want is a joined table that looks like:

engagement_sfid partner_sfid engagement_name partner_name valid_from valid_to
eng1 part1 Engagement Old Partner Old 2021-10-08 00:01:01 2021-10-10 00:00:01
eng1 part1 Engagement New Partner Old 2021-10-10 00:00:01 2021-10-12 00:00:01
eng1 part1 Engagement New Partner New 2021-10-12 00:00:01 2999-12-31 23:59:59

Fortunately, most of this problem has already been solved, and I’ve implemented it in a dbt macro. To use it, you need to do some light data preparation of the tables involved as CTEs and then use the macro to specify the CTEs involved and how they join together:

WITH engagements AS (
    {{ ref('engagements') }}

partners AS (
    {{ ref('partners') }}

    left_fields=['engagement_sfid', 'engagement_name', 'partner_sfid', 'engagement_dotdotdot'],
      'partners': {
        'fields': ['partner_sfid', 'partner_name', 'partner_dotdotdot'],
        'left_on': 'partner_sfid',
        'right_on': 'partner_sfid',

SELECT * FROM trange_final

Interesting approach @gnilrets . Generally, in dwh modeling, the SCDs are at the end of the process, and by rule, you do not join these together directly – but through the interesting facts that you want to measure. So, you might have a simple fact of ‘number_of_engagements’ in which both partners and engagements are connected to in a table like “partner_sfid, engagement_sfid, nb_engagements”. in this manner, the right partner and the right engagement record is linked to the fact (in this case 1 – “factless fact table”), and as a result, together.

In dbt, they are encouraging us to put snapshots very early on, and I’m trying to leverage these years of modeling experience to wrap my head around this approach.

cc: @claire (who has posted tons of great modeling articles).

1 Like

Thanks for posting this. We use the same pattern on our team though we haven’t yet made it a macro - definitely will take a look at yours. We call the pattern a change date spine.