Snapshot behavior

Today we discovered an issue with the behavior of dbt snapshot that I believe is not unique to my use case, but could affect everyone who uses dbt. Before posting an issue on Github, I wanted to discuss it here to see if folks agree that this could be improved.

Scenario:
invalidate hard deletes = True
A record is snapshotted.
Then the record is no longer in source, so it gets “ended” (dbt_valid_to is populated).
Then later the record appears in the source again.
I would expect it to be snapshotted again, but it does not.

More detail:
When dbt creates the temp table that is used as the basis for the merge statement, flagging each record with “insertion” or “update”… my new record (the one that went away and came back) IS flagged with “insertion”. Yay! So I would guess in the next step, the record would be inserted into the snapshot table.

To understand why it is not re-inserted, we need to look at the merge statement:

    merge into {{ target }} as DBT_INTERNAL_DEST
    using {{ source }} as DBT_INTERNAL_SOURCE
    on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id

    when matched
     and DBT_INTERNAL_DEST.dbt_valid_to is null
     and DBT_INTERNAL_SOURCE.dbt_change_type in ('update', 'delete')
        then update
        set dbt_valid_to = DBT_INTERNAL_SOURCE.dbt_valid_to

    when not matched
     and DBT_INTERNAL_SOURCE.dbt_change_type = 'insert'
        then insert ({{ insert_cols_csv }})
        values ({{ insert_cols_csv }})

Because we are merging on DBT_INTERNAL_SOURCE.dbt_scd_id = DBT_INTERNAL_DEST.dbt_scd_id regardless of dbt_valid_to, this record is considered matched. We know that matched records don’t get inserted.

This leaves me with a temp table that says “go ahead and insert this” and nothing gets inserted!

I considered modifying the macro, but it causes a lot of unintended complexity when you get really into it.

Question for the community: Do you see this as a bug? If not, why? Looking forward to hearing your thoughts!

I think it works, but only if your “update_at” field has evolved.
Indeed, the field used for the merge (dbt_scd_id) corresponds to the combination of the PK and the “update_at” field. If the latter has evolved, the result is “not matched”.

md5(coalesce(cast(pk_key as varchar ), ‘’)
|| ‘|’ || coalesce(cast(DATE_MAJ as varchar ), ‘’)
) as dbt_scd_id