I’m trying to implement a table that snapshots metrics daily for a particular customer id. The idea is to run it once a day and store the totals for the day. I found a medium article that proposed something like this, but I can’t get it to work (dbt run failed).
The context of why I’m trying to do this
This would be a fact table that I would join with dim tables (scd type 2). Having this at the daily grain allows me to plot measures over time, and having the date also allows me to use scd type 2 to get the values of attributes that where valid at the time of the snapshot.
What I’ve already tried
I tried the method suggested in this medium article article but it is failing for me.
Some example code or error messages
{{ config(
materialized='snapshot',
unique_key= ['order_id', 'snapshot_timestamp']
) }}
select
GETDATE() snapshot_timestamp,
coalesce(orders.order_id,0) order_id,
orders.customer_id,
sum (case when payments.status = 'success' then amount end) as amount
from {{ ref('stg_orders') }} orders
left join {{ ref('stg_stripe_payments') }} payments on orders.order_id = payments.order_id
group by 1,2,3
The code above results in
Compilation Error in model fct_orders_snapshot (models/fct_orders_snapshot.sql)
argument of type 'NoneType' is not iterable
> in macro materialization_snapshot_default (macros/materializations/snapshots/snapshot.sql)
> called by model fct_orders_snapshot (models/fct_orders_snapshot.sql)
I’ve also tried passing the unique key as = ‘order_id, snapshot_timestamp’ and ‘order_id’, ‘snapshot_timestamp’.
Note
I’m using the jaffle_shop data to test if this is doable in dbt, I know snapshotting this model makes little/no sense.
So, for your use case, to the extent that that article is correct at all, it’s a red herring anyway. If you want to store metrics every day for a particular customer ID, you should probably use an incremental model. Most of the code that you have in your example will work in an incremental context.
You don’t get the “point in time” aspect that the article is talking about, but you should probably have a better strategy for that anyway since in principle, dbt models should be able to be full-refreshed at any time and that would destroy any history in the table.
Hey, were you able to resolve it? I’m facing the same issue.
I created a snapshot as below: snapshots:
- name: division_snapshot
relation: ref(‘Divisions’)
config:
schema: NessieCatalog
database: DDATest_Curated
unique_key: division_id
strategy: timestamp
updated_at: airbyte_run_jobtime
{{ config(
materialized='incremental',
unique_key= ['customer_id', 'snapshot_timestamp']
) }}
select
GETDATE() snapshot_timestamp,
orders.customer_id,
sum (case when payments.status = 'success' then amount end) as amount,
count(distinct orders.order_id) total_orders
from {{ ref('stg_orders') }} orders
left join {{ ref('stg_stripe_payments') }} payments on orders.order_id = payments.order_id
group by 1,2