Preserve query results at a point in time

bonnevie · January 6, 2023, 3:12pm

Mostly, we want models to refresh every time we run, but I am in a situation where I would like to make a model that is only run once, to preserve the query result for a particular point in time. Could also hypothetically be an expensive query that we want to run extremely rarely. Is there any way to “protect” the resulting materialized table from writes by dbt? I assume making it explicitly read-only will cause problems when dbt is run.
I guess I can run the other models selectively, but I’d worry about human error corrrupting the table .

jeffsk · January 8, 2023, 1:44pm

Make it incremental.

Then add

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  where 1=0

Add this configuration:

models:
  [<resource-path>](resource-path):
    +full_refresh: false

Or at the model level:

{{ config(
    full_refresh = false
) }}

This will make it so nothing ever happens on the table.

joellabes · January 9, 2023, 10:17pm

An incremental model which doesn’t add any extra rows after its initial invocation is a good option!

For this use case, snapshots could also be an option: Add snapshots to your DAG | dbt Developer Hub

bonnevie · January 10, 2023, 10:23am

Thanks for the advice @jeffsk and @joellabes, a bit hacky, but sounds like it will do the trick!
I tried turning it into a macro, but it gets a bit wonky if you have to take into account that the WHERE is optional; maybe there’s a better way:

{% macro ensure_only_run_once(
        append = false
    ) %}

{% if is_incremental() %}
{% if append %}
    AND 1 = 0
{% else %}
    WHERE
        1 = 0
    {% endif %}
{% endif %}
{% endmacro %}

bonnevie · January 10, 2023, 11:55am

@joellabes tried looking a bit at snapshots, and I see how they might be a good solution. Could you give an example of doing a “time travel”/fixed-point-in-time query with a snapshot? Is it as simple as only selecting rows in the correct valid range?
And are you at risk of a full refresh with snapshots, or are they always continuously updated?

joellabes · January 11, 2023, 5:58am

Do you mean how do you query a snapshot table to get a point-in-time result? If so then,

yes!

You would do something like

select * 
from {{ ref('the_snapshot') }} 
where getdate() between dbt_valid_from and dbt_valid_to

, keeping in mind that the current version of a row will have dbt_valid_to of null. It’s common to coalesce that to a value in the distant future for ease of querying, normally something like 2099-12-31.

Check out Building models on top of snapshots for our best practices when working with snapshots.

Snapshots are updated each time you run the dbt snapshot command, and will never be dropped by dbt, even with a full-refresh command. It’s a good idea to put your snapshots into a different schema to prevent someone from accidentally dropping them, as their data is by definition non-recoverable (without backups), in contrast to most dbt models where as long as you have the source tables you can get back to the starting point.

bonnevie · January 11, 2023, 6:39am

Great info, that really helped me get a better understanding of snapshots - and I think they might actually be a great solution for our problem. And great link too, had been looking for something like that in the docs. Thanks

system · January 18, 2023, 6:39am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Building models on top of snapshots Show and Tell snapshots	5	18718	October 23, 2021
Periodic Snapshot Fact Tables Help	4	157	March 7, 2025
snapshots and incremental models implementation in dbt with Bigquery Help snapshots , incremental , bigquery	2	2250	February 13, 2023
best way to have a fixed table Help snapshots , incremental , best-practice	1	1098	August 15, 2023
Separate models for initial load and for incremental updates Help incremental	2	1700	July 19, 2023

Preserve query results at a point in time

Related topics