I have read that snapshotting source data is best practise, but we are planning on snapshotting our dimension tables once we have modelled the data.
Should we do both? Is the source snapshot just used as a fail-safe to roll back to any point in time if anything goes seriously wrong? Or is there any additional advantages of doing this?
@ShaunLF, I actually think one is fine, except if you are performing serious upstream changes on your source tables to arrive at your dimension…
If Yes. Then it’s logically efficient to stick to your dimensions when they are SCD’s.
Yes it can serve for fail over, but I assume your source tables might have landed from a process that can be regenerated.
Its a two way thing depending on what works for your use case.
I won’t want to create snapshots and pay for extra storage on sources if I can regenerate them from a data lake layer on cases of failures. That’s why dimension snapshoting will be better to track changes over time.
The reason to avoid snapshotting your final dimension tables is that if you find a bug and need to change your modelling code, you can find yourself with inaccurate results and no way of recalculating the correct data. By contrast, if you snapshot the source data and build on top of that, you can always recalculate everything from first principles if/when you need to.
I certainly snapshot some final dimensional tables when the additional effort to make the table fully independent of time (or I can’t be bothered ), but it’s pretty much always in addition to snapshotting the source table instead of a replacement.