Snapshots should almost always be run against source tables. Your models should then select from these snapshots, treating them like regular data sources.
So what’s the exception that makes us say “almost always”?
Let’s say you have a dbt model that summarizes key financial metrics at the grain of one record per month. Since these metrics are reported to the board or investors, it’s important that they do not change after the reporting period – if suddenly your model is showing different numbers for last month compared to what it said a few days ago, that’s an indicator that something has gone wrong.
We think snapshots can be a great fit here – by snapshotting your metrics model once a month, you can audit the values over time, giving you confidence that your key metrics are absolutely rock-solid. To be very clear though: this is a pattern that likely doesn’t apply to most companies and should be used only when necessary. In general, metrics tables like these can be hard to work with – they can’t be sliced and diced in a reporting tool, and some aggregates (like distinct counts) can’t be readily combined. Balance these constraints when determining if a metrics table is right for your organization.
If you do implement this pattern, I’d use subdirectories within your
snapshots folder to clearly group together which snapshots record changes to raw data versus transformed data.
. ├── dbt_project.yml ├── models | └── ... └── snapshots ├── model_snapshots │ └── snapshot_metrics.sql └── source_snapshots └── snapshot_my_saas_app__orders.sql
Using directories also enables you to execute these snapshots as separate jobs more easily:
$ dbt snapshot --select model_snapshots.*
To that end, I would definitely deploy this command as a separate job to your “source” snapshots, as you’ll probably want to control the frequency of these jobs separately – once a month feels right for the snapshot of the
metrics model, whereas
hourly feels right for the snapshot of the