I have read that snapshotting source data is best practise, but we are planning on snapshotting our dimension tables once we have modelled the data.
Should we do both? Is the source snapshot just used as a fail-safe to roll back to any point in time if anything goes seriously wrong? Or is there any additional advantages of doing this?
@ShaunLF, I actually think one is fine, except if you are performing serious upstream changes on your source tables to arrive at your dimension…
If Yes. Then it’s logically efficient to stick to your dimensions when they are SCD’s.
Yes it can serve for fail over, but I assume your source tables might have landed from a process that can be regenerated.
Its a two way thing depending on what works for your use case.
I won’t want to create snapshots and pay for extra storage on sources if I can regenerate them from a data lake layer on cases of failures. That’s why dimension snapshoting will be better to track changes over time.
Thanks Victor - appreciate your response!
We are currently ingesting from a SQL Database that overwrites changes so we wouldn’t be able to roll back to a given point in time.
The use case for snapshotting your source data is described here: Snapshots | dbt Developer Hub
Also check out the best practices detailed further down on the same page: Snapshots | dbt Developer Hub
The reason to avoid snapshotting your final dimension tables is that if you find a bug and need to change your modelling code, you can find yourself with inaccurate results and no way of recalculating the correct data. By contrast, if you snapshot the source data and build on top of that, you can always recalculate everything from first principles if/when you need to.
I certainly snapshot some final dimensional tables when the additional effort to make the table fully independent of time (or I can’t be bothered ), but it’s pretty much always in addition to snapshotting the source table instead of a replacement.
Thanks for your input Joel - much appreciated! Pretty much what I was thinking and we have now reworked our project to snapshot source data
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.