dbt snapshots: How to backfill from a table without updated_at/created_at timestamp?

Note: This issue has also been posted on StackOverflow, and can be found here.

We have several years worth of daily SELECT * FROM source.table captures for a given table. We want to create a dbt snapshot that tracks row level changes over the set of daily captures. The table does not have an updated_at field.

The perceived issue is that:

  1. Given that the table does not have an created_at/updated_at field, we cannot use the dbt snapshot “timestamp” strategy;
  2. Given that we have to backfill several years of daily snapshot on a first run, we think that we cannot use the “check” strategy, as our understanding is that the check strategy uses the time of execution to fill the dbt_* fields.

We could, in theory add the extraction timestamp as the updated_at field. However since our understanding of the “timestamp” strategy is that dbt only tracks the timestamp field, we assume that every record will have a new version for every capture we add to the dbt snapshot table. Is that a correct assessment?

Our best option at this time is to update manually the values in dbt_* fields to the extraction timestamps of our captures in storage in between every dbt snapshot run, and use the “check” strategy to determine changes.

Is there any dbt- native solution possible for this issue?

Thanks a lot for your answers!

1 Like