The problem I’m having
Using Snowflake with dbt core, I am trying to make an incremental model that deletes a fixed number of days (e.g. 30) and then reappends those 30 days back onto the table. I know you can do something like this with a pre-hook, but the problem we’re running into is that a pre-hook runs separately from the actual model (i.e. if the model fails you’re just out those days until it succeeds) and there is also a period where the table is just out those records because the pre-hook and model don’t run together as a transaction.
Is there a way in Snowflake to have the pre-hook and model materialize as part of a single transaction? I haven’t been able to figure out how at this point.
The context of why I’m trying to do this
Trying to do a delete then reappend because there are instances where the dimensions on the table can change and there can be cases where records in the table actually need to be removed due to business logic - incremental strategies like merge don’t work well in this situation where deletes are needed AFAIK.
What I’ve already tried
In addition to a standard prehook to delete records and then incrementally appending, I’ve tried putting several statements inside of a block like below but that didn’t seem ideal and I wouldn’t want to have to worry about transactions hanging in Snowflake if something goes wrong.
{% set query %}
begin;
[sql];
commit;
{% endset %}
{% do run_query(query) %}
You could use a post-hook, but then you have the problem that the post hook might fail
You could use blue-green deployment to keep failed builds out of sight of users until they’re resolved.
You could write a custom materialisation and/or custom incremental strategy that does the steps you want.
Note: @Mike Stanley
originally posted this reply in Slack. It might not have transferred perfectly.
1 Like
see:
BY DEFAULT PRE/POST-HOOKS EXECUTE IN SAME TRANSACTION AS MODEL (but only if database supports, seems snowflake adapter DOESN’T!!!):
Reading above link says by default pre-hooks and post-hooks run in same transaction IF WE using a dbt adapter that supports transactions (sadly or strangely seems Snowflake adapter doesn’t support ?!?).
EXECUTE PRE/POST-HOOKS OUTSIDE MODEL TRANSACTION:
That the macros before_begin and after_commit used on pre-hook and post-hook is for the opposite when we want things to not execute in same transaction as model:
“(…)You want to insert a record into an audit table at the start of a run and do not want that statement rolled back if the model creation fails.(…)”
ABILITY TO EXECUTE PRE/POST-HOOKS & Model IN SAME TRANSACTION (either all executes or neither if any fails)
SUPPORTED DATABASES:
NOT SUPPORTED:
- Snowflake, BigQuery, and Spark or Databricks.
Meaning in your case as using Snowflake, you just add the DELETE on the pre-hook and thats it, either both works or both will rollback (DELETE and model).p:
{{
config(
pre_hook=(“DELETE 30 DAYS”),
pre_hook=before_begin(“INSERT LOG(‘starting model…’, current_timestamp()”)
)
}}
I wonder if we can use both pre-hook AND pre-hook=before_begin(sql), like:
{{
config(
pre_hook=(“DELETE 30 DAYS”),
pre_hook=before_begin(“INSERT LOG(‘starting model…’, current_timestamp()”)
)
}}
- DELETE gets executed in same transaction as model (both rollback if model or delete fails).
- but insert always execute no matter if bith DELETE+MODEL commits well or fail/rollback
1 Like
So to clarify in my case I want everything rolled back if the model fails and I do not want the deletions to show up in a separate transaction. This is because the model gets run every hour and takes a few minutes to run. So I don’t want a period of a few minutes every hour where there are deletions in the table and even worse if the model fails then the deletions are just stuck there. It seemed like the pre-hook for the snowflake adapter runs separately from the model - not sure if before_begin helps in this case?
You can use the<https://docs.getdbt.com/reference/resource-configs/sql_header| sql_header >config for SQL that should be run in the same transaction as the model code, before the model.
Note: @Mike Stanley
originally posted this reply in Slack. It might not have transferred perfectly.
Hi @a_slack_user Good to know about sql_header but looking to documentation:
(…) create table as
and create view as
s(…)
…meaning probably sql_header() only applies to non-incremental models right ?
Simplifying, seems only Postgres and Redshift able to execute multiple dml/dml in same transaction → meaning what you want not possible for Snowflake adapter as it doesn’t support yet transactions as per dbt docs * pre-hook & post-hook | dbt Developer Hub
FYI: Heres the have 3 scenarios (if I’m not wrong):
- execute in same transaction as model: pre_hook/post_hook only Postgres/redshift.
- execute outside model transaction:
- default for snowflake, bigquery, and spark or databricks, adapters.
- for postgres/redshift use: pre_hook =before_begin(“SQL-statement”),
post_hook=after_commit(“SQL-statement”)
- about sql_header= docs say it executes in same “QUERY” (like sending both SQL together) , BUT THAT DOESNT mean they executed in same transaction right ?
@a_slack_user can you confirm above 3 points ? specially your feeling about sql_header (same query vs same transaction)
Thanks
2 Likes
Then one other followup to this - why wouldn’t a delete+insert
strategy on a non-unique date field as the unique_key
for the config work in this case? The thought being it would be a delete from [table] where in [subquery]
potentially that would execute. Is that something that would work? FWIW this table is very large.