Implementing SCD2 Best Practices

Hi All,

Looking for Best Practices in DBT - Snowflake to implement SCD2. Trying to understand if Snapshot is the final layer of the data in warehouse or do we have any other approach where we store the SCD2 data and expose it downstream applications.


This is going to widely vary based on what the business users want to see. The number of scenarios are limitless. But I think the best answer is always start with dbt Snapshot and then model from there.

Here are some generalizations…
Snapshotting Dimensions:
Start with dbt Snapshot. Then work with the end users on their requirements. Do they want to see the dimension data as of the time of the transaction, or do they always want to see today’s version of the dimension? Then I build a model that solves their requirement. Usually the business users do not query the dimension table directly. We join it to a fact table for them and present the BI tool with one wide table and all of the columns. (The BI Users will not need to know how to use the valid_from / valid_to dates).

Snapshotting Facts:
Again start with dbt Snapshot. We usually have a model after the snapshot to only display the active record so the Business folks don’t need to worry about filtering in the BI tool. Occasionally users will require a “snapshot trend view”, example, count active rows by day displayed on a bar chart in the BI tool. In this case we need to create a view to populate the entire data set (active rows) for each calendar date.