Note: This issue has also been posted on StackOverflow, and can be found here.
We have several years worth of daily SELECT * FROM source.table
captures for a given table. We want to create a dbt snapshot that tracks row level changes over the set of daily captures. The table does not have an updated_at field.
The perceived issue is that:
- Given that the table does not have an created_at/updated_at field, we cannot use the dbt snapshot “timestamp” strategy;
- Given that we have to backfill several years of daily snapshot on a first run, we think that we cannot use the “check” strategy, as our understanding is that the check strategy uses the time of execution to fill the dbt_* fields.
We could, in theory add the extraction timestamp as the updated_at field. However since our understanding of the “timestamp” strategy is that dbt only tracks the timestamp field, we assume that every record will have a new version for every capture we add to the dbt snapshot table. Is that a correct assessment?
Our best option at this time is to update manually the values in dbt_* fields to the extraction timestamps of our captures in storage in between every dbt snapshot run, and use the “check” strategy to determine changes.
Is there any dbt- native solution possible for this issue?
Thanks a lot for your answers!