Unable to recreate SCD2 table (updated_At, valid_from, valid_to with todays date)
Hello, we are migrating our ETL jobs to dbt, and we are on the point where we need/want to migrate our scd2 tables. Wondering what can be the best option and taking in consideration that we have a few extraction files (full extraction so timestamp strategy won’t be applicable).
Also I would like to have a plan to when, (for some reason) I need to recreate SCD2 tables I might be able to do it.
We have python script that will execute dbt snapshot on each different date that we have on our source table (weekly FULL SNAPSHOTS). “simulating” the insertion. However we are unable to adapt the dbt_updated_at, dbt_valid_from and dbt_valid_to to the respective dates, meaning that they are all getting timestamps from the current date.
Hoping to have a few insights from your side !
Appreciate !
1 Like
Hi, @oliveirajo
Hmm what snapshot strategy are you using?
1 Like
We are using “check strategy”.
We are setting up a dbt snapshot to migrate to dbt.
We have a table with weekly extractions from a source system, and we need to build now a scd2 table.
Since the upstream have a column named “extraction_date”, we though about using the “check” strategy to control it and use a variable with the dbt snapshot where we could easily pass the different dates to recreate a scd2 table.
However we noticed the cols generated by dbt : dbt_updated_at, dbt_valid_from e dbt_valid_to are taking into consideration today’s date (execution date timestamps).
Can we force dbt to change the dates that he takes into consideration ?
I think that this might be very useful even when, for example, the scheduled execution, during a weekend failed, and on Monday we need to re-launch the pipeline from Saturday…
Maybe there is another approach that I’m missing here.
Is it possible to set the dbt_valid_from and the dbt_valid_to in a way that reflects the date we snapshot.
Thanks for your help!
I have the same question. Sometimes you have a data source that has a changelog (like an effective date for each row change), but it’s not a full SCD source with end dates and so on. You’d want to feed this into a snapshot process and have it use the incoming data’s data-change date as the dbt_valid_from date, and let dbt manage the rest…
1 Like