The problem I’m having
I’m working on an incremental dbt model that unions multiple event tables (stream_ended
, stream_started
, gift_received
) and assigns a unique event_id
. Initially, I used CURRENT_TIMESTAMP
as insert_date
, following the best practices from Snowflake’s Mixpanel ingestion documentation. However, I realized that insert_date
gets updated every time the model runs, which isn’t ideal—I need it to stay fixed for existing records and only assign CURRENT_TIMESTAMP
to new ones.
The context of why I’m trying to do this
I need a stable insert_date
to track when each event was first ingested into my model, without it changing on every incremental run. This is important for accurate historical tracking and downstream analytics.
What I’ve already tried
- Using
CURRENT_TIMESTAMP
directly, but it updates on every run. - Considering
COALESCE(existing.insert_date, CURRENT_TIMESTAMP)
, but I’m unsure if this is the best approach. - Thinking about implementing the merge strategy in Snowflake, but I’m not sure if that would be the right solution.
Some example code
Here’s a simplified version of my model:
SELECT
md5(streamer_id || viewer_id || transaction_timestamp) AS event_id,
'gift_received' AS event_name,
streamer_id,
transaction_timestamp AS timestamp,
CURRENT_TIMESTAMP AS insert_date, -- This updates on every run, which I don’t want
OBJECT_CONSTRUCT(
'streamer_id', streamer_id,
'viewer_id', user_id,
'gift_id', gift_id,
'gift_points', gift_points,
'timestamp', transaction_timestamp
) AS properties
FROM {{ ref('transactions') }}
My questions:
- How can I persist
insert_date
for existing records while only assigningCURRENT_TIMESTAMP
to new ones? - Would using
COALESCE(existing.insert_date, CURRENT_TIMESTAMP)
be the best approach? - Would the merge strategy in Snowflake help with this, and if so, how should I implement it in dbt?
Any insights would be greatly appreciated! Thanks in advance!