A question that pops up once in a while on Slack is how to prevent full refreshes of incremental models, often because rebuilding the table from scratch would take many hours, or even multiple days.
Here’s two ways to approach this:
- Completely disable the ability to fully refresh a model
- Add a protection layer to prevent accidental refreshes
Approach 1
Let’s say you absolutely, positively want to prevent ANY full refresh for a model. You could add the following code at the top of your model:
{% if execute %}
{% if flags.FULL_REFRESH %}
{{ exceptions.raise_compiler_error("Full refresh is not allowed for this model. Exclude it from the run via the argument \"--exclude model_name\".") }}
{% endif %}
{% endif %}
This can be read as: When executing dbt, if the --full-refresh argument is used, raise a compiler error and prevent the model from being executed.
Doing a dbt run --full-refresh
will fail. If you have multiple incremental models and you are ok with the other ones being refresh, you would need to use model selection syntax to exclude the models that are disallowed to be fully refreshed.
Approach 2
A more flexible approach is to allow full refreshes but only if a specific variable with a specific value is present. This hopefully ensures that the action is explicit, and not the result of an accidental/careless dbt run --full-refresh
.
We can slightly modify the code presented earlier to add an extra condition:
{% if execute %}
{% if flags.FULL_REFRESH and var('allow_full_refresh', False) != True %}
{{ exceptions.raise_compiler_error("Full refresh is not allowed for this model unless the argument \"--vars 'allow_full_refresh: True'\" is included in the dbt run command.") }}
{% endif %}
{% endif %}
Almost the same code, but we’re adding an and
condition to check the variable allow_full_refresh
, which has a default value of False if the variable doesn’t exist when dbt is executed.
To fully refresh this model, one would simply run the command dbt run --full-refresh --vars 'allow_full_refresh: True'
.
This is only an example. Instead of checking a variable, you could test on which environment the full refresh is executed through target.name
to allow on prod and disable on dev, for example, or the role used on Snowflake via target.role
.
Relevant documentation:
Don’t hesite to ask on Slack in #modeling if you want to take a new approach for this and you’re unsure whether its possible and how to do it. We’ll be happy to help!