Preserve query results at a point in time

Mostly, we want models to refresh every time we run, but I am in a situation where I would like to make a model that is only run once, to preserve the query result for a particular point in time. Could also hypothetically be an expensive query that we want to run extremely rarely. Is there any way to “protect” the resulting materialized table from writes by dbt? I assume making it explicitly read-only will cause problems when dbt is run.
I guess I can run the other models selectively, but I’d worry about human error corrrupting the table .

Make it incremental.

Then add

{% if is_incremental() %}

  -- this filter will only be applied on an incremental run
  where 1=0

Add this configuration:

models:
  [<resource-path>](resource-path):
    +full_refresh: false

Or at the model level:

{{ config(
    full_refresh = false
) }}

This will make it so nothing ever happens on the table.

1 Like

An incremental model which doesn’t add any extra rows after its initial invocation is a good option!

For this use case, snapshots could also be an option: Add snapshots to your DAG | dbt Developer Hub

Thanks for the advice @jeffsk and @joellabes, a bit hacky, but sounds like it will do the trick!
I tried turning it into a macro, but it gets a bit wonky if you have to take into account that the WHERE is optional; maybe there’s a better way:

{% macro ensure_only_run_once(
        append = false
    ) %}

{% if is_incremental() %}
{% if append %}
    AND 1 = 0
{% else %}
    WHERE
        1 = 0
    {% endif %}
{% endif %}
{% endmacro %}

@joellabes tried looking a bit at snapshots, and I see how they might be a good solution. Could you give an example of doing a “time travel”/fixed-point-in-time query with a snapshot? Is it as simple as only selecting rows in the correct valid range?
And are you at risk of a full refresh with snapshots, or are they always continuously updated?

Do you mean how do you query a snapshot table to get a point-in-time result? If so then,

yes!

You would do something like

select * 
from {{ ref('the_snapshot') }} 
where getdate() between dbt_valid_from and dbt_valid_to

, keeping in mind that the current version of a row will have dbt_valid_to of null. It’s common to coalesce that to a value in the distant future for ease of querying, normally something like 2099-12-31.

Check out Building models on top of snapshots for our best practices when working with snapshots.

Snapshots are updated each time you run the dbt snapshot command, and will never be dropped by dbt, even with a full-refresh command. It’s a good idea to put your snapshots into a different schema to prevent someone from accidentally dropping them, as their data is by definition non-recoverable (without backups), in contrast to most dbt models where as long as you have the source tables you can get back to the starting point.

1 Like

Great info, that really helped me get a better understanding of snapshots - and I think they might actually be a great solution for our problem. And great link too, had been looking for something like that in the docs. Thanks :smiley:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.