Today we discovered an issue with the behavior of dbt snapshot
that I believe is not unique to my use case, but could affect everyone who uses dbt. Before posting an issue on Github, I wanted to discuss it here to see if folks agree that this could be improved.
Scenario:
invalidate hard deletes = True
A record is snapshotted.
Then the record is no longer in source, so it gets “ended” (dbt_valid_to
is populated).
Then later the record appears in the source again.
I would expect it to be snapshotted again, but it does not.
More detail:
When dbt creates the temp table that is used as the basis for the merge
statement, flagging each record with “insertion” or “update”… my new record (the one that went away and came back) IS flagged with “insertion”. Yay! So I would guess in the next step, the record would be inserted into the snapshot table.
To understand why it is not re-inserted, we need to look at the merge statement:
merge into {{ target }} as DBT_INTERNAL_DEST
using {{ source }} as DBT_INTERNAL_SOURCE
on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id
when matched
and DBT_INTERNAL_DEST.dbt_valid_to is null
and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
then update
set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to
when not matched
and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
then insert ({{ insert_cols_csv }})
values ({{ insert_cols_csv }})
Because we are merging on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id
regardless of dbt_valid_to
, this record is considered matched
. We know that matched
records don’t get inserted.
This leaves me with a temp table that says “go ahead and insert this” and nothing gets inserted!
I considered modifying the macro, but it causes a lot of unintended complexity when you get really into it.
Question for the community: Do you see this as a bug? If not, why? Looking forward to hearing your thoughts!