Using dynamic schemas for snapshots

Unlike models, snapshots build into the same target_schema, no matter who is running them. This helps ensure that whenever you ref a snapshot, you’re using the production version of the snapshot, so that your dev transformations reflect reailty.

However, this can cause problems if you want to test the snapshot command in dev or even during a CI run, since the database user you run this command with in dev/CI may not have the correct privileges to mutate the snapshot table.

If this is causing headaches for you, there is a workaround! Here’s how I recently did it:

1. Use the generate_schema_name macro to make the target_schema environment-aware

The generate_schema_name macro is how dbt constructs an environment-aware schema name for a model (docs), and we can leverage it here to do the same thing.

Normally, I’d configure this directly from the dbt_project.yml file for all my snapshots to make them behave consistently. But for reasons that I’d prefer to not get into (read: I don’t fully understand) we have to do this directly from the snapshot itself

snapshots/orders_snapshot.sql:

{% snapshot orders_snapshot %}

{{
    config(
      target_database='analytics',
      target_schema=generate_schema_name('snapshots'),
      unique_key='id',

      strategy='timestamp',
      updated_at='order_date',
    )
}}

select * from {{ source('jaffle_shop', 'orders') }}

{% endsnapshot %}

If you’re using custom databases, it might also be a good idea to do the same for your target_database configurations (i.e. target_database=generate_database_name('analytics'))

Now when you $ dbt run, the snapshot will be built in a different schema depending on who is running it.
For my setup:

-- in prod:
analytics.snapshots.orders_snapshot

-- in dev:
analytics.dbt_claire_snapshots.orders_snapshot

2. Don’t ref your snapshots

The docs say you should select from snapshots using ref:

select * from {{ ref('orders_snapshot') }}

But, if you’ve followed step one, you really shouldn’t use ref. This is because we want to make sure we are always selecting from the production version of the snapshot, not your dev or ci version. If we use ref, that won’t be the case.

So, what’s an analytics engineer to do?

Option 1: Use sources

Some folks have worked around this by choosing to define their snapshots as sources, and instead using the {{ source(...) }} function to select from their snapshot. That’s a reasonable approach, but the one downside is that you’ll end up with two nodes representing the same thing in the docs —one for the snapshot, and one for the source that represents the snapshot. Not too bad of a trade-off in the schema of things!

Option 2: Create a macro to ref for you

What if, instead of {{ ref('orders_snapshot') }}, we could do something like:

select * from {{ snapshot('orders_snapshot') }}

And have it always compile to the production version of the snapshot, while maintaining the edge in the DAG between the snapshot and the model?

We can! (Quelle surprise!)

Here’s a macro that I recently used:

macros/snapshot.sql

{% macro snapshot(snapshot_name) %}

{% set snapshot_relation=ref(snapshot_name) %}

{# Make sure this is the actual name of your target in prod #}
{% if target.name != 'prod' %}

{# Change this part based on how you name things in prod #}
{% set prod_snapshot_relation = adapter.get_relation(
      database=snapshot_relation.database,
      schema='snapshots',
      identifier=snapshot_relation.identifier
) %}

{% endif %}

{#
Use an or operator to handle:
1. the case where the prod version has not yet been created
2. the case where we are in prod
#}

{{ return(prod_snapshot_relation or snapshot_relation) }}

{% endmacro %}

Et voilà! In both prod and dev, my model will select from:

select * from analytics.analytics.orders_snapshot

And my documentation looks right!

Here’s a Loom video where I run through this (it’s a little long, but I go into much greater depth):

2 Likes