Periodic Snapshot Fact Tables

The problem I’m having

I’m trying to implement a table that snapshots metrics daily for a particular customer id. The idea is to run it once a day and store the totals for the day. I found a medium article that proposed something like this, but I can’t get it to work (dbt run failed).

The context of why I’m trying to do this

This would be a fact table that I would join with dim tables (scd type 2). Having this at the daily grain allows me to plot measures over time, and having the date also allows me to use scd type 2 to get the values of attributes that where valid at the time of the snapshot.

What I’ve already tried

I tried the method suggested in this medium article article but it is failing for me.

Some example code or error messages

{{ config(
   materialized='snapshot',
   unique_key= ['order_id', 'snapshot_timestamp']
) }}

select 
    GETDATE() snapshot_timestamp,
    coalesce(orders.order_id,0) order_id,
    orders.customer_id,
    sum (case when payments.status = 'success' then amount end) as amount
from  {{ ref('stg_orders') }} orders
left join {{ ref('stg_stripe_payments') }} payments on orders.order_id = payments.order_id
group by 1,2,3

The code above results in

Compilation Error in model fct_orders_snapshot (models/fct_orders_snapshot.sql)
  argument of type 'NoneType' is not iterable
  
  > in macro materialization_snapshot_default (macros/materializations/snapshots/snapshot.sql)
  > called by model fct_orders_snapshot (models/fct_orders_snapshot.sql)

I’ve also tried passing the unique key as = ‘order_id, snapshot_timestamp’ and ‘order_id’, ‘snapshot_timestamp’.

Note
I’m using the jaffle_shop data to test if this is doable in dbt, I know snapshotting this model makes little/no sense.

Don’t get confused by some of the overloaded terminology in this domain. dbt snapshots are a tool for turning SCD1 data into SCD2 data.

I don’t know if this article is out of date or what, but snapshot is also <Materializations | dbt Developer Hub a materialisation option>, it’s <Add snapshots to your DAG | dbt Developer Hub whole separate thing> and now as of 1.9 is available as <Add snapshots to your DAG | dbt Developer Hub configuration> with no SQL code. Snapshots are not ordinary dbt models at all.

So, for your use case, to the extent that that article is correct at all, it’s a red herring anyway. If you want to store metrics every day for a particular customer ID, you should probably use an incremental model. Most of the code that you have in your example will work in an incremental context.

You don’t get the “point in time” aspect that the article is talking about, but you should probably have a better strategy for that anyway since in principle, dbt models should be able to be full-refreshed at any time and that would destroy any history in the table.

Note: @Mike Stanley originally posted this reply in Slack. It might not have transferred perfectly.

Thanks for the reply <@U05HW5JBVK3>. I’ll test the incremental model instead.

Note: @diogomaiog originally posted this reply in Slack. It might not have transferred perfectly.

Hey, were you able to resolve it? I’m facing the same issue.
I created a snapshot as below:
snapshots:
- name: division_snapshot
relation: ref(‘Divisions’)
config:
schema: NessieCatalog
database: DDATest_Curated
unique_key: division_id
strategy: timestamp
updated_at: airbyte_run_jobtime

Hi @akshat , I believe this worked in my tests:

{{ config(
   materialized='incremental',
   unique_key= ['customer_id', 'snapshot_timestamp']
) }}
select
   GETDATE() snapshot_timestamp,
   orders.customer_id,
   sum (case when payments.status = 'success' then amount end) as amount,
   count(distinct orders.order_id) total_orders
from  {{ ref('stg_orders') }} orders
left join {{ ref('stg_stripe_payments') }} payments on orders.order_id = payments.order_id
group by 1,2

Hope it helps