How to perform "delete" operations in Snowflake using "delete+insert" in incremental strategy and materialization?

hdsarmah · May 31, 2023, 11:32am

The problem I’m having

I’ve been trying to write up a model that will handle all the insert, update and delete of data in Snowflake. I’ve used the incremental strategy as “delete+insert” and added a date based condition in the “is_incremental()” so that the records in the source with value for the check column greater than the current max value in target will be considered as the “incremental/delta” dataset.

After multiple runs, I can see that the Insert and Update operations are happening as expected, however, the deletion of the records are not happening.

Can anyone tell me what it is I am missing in the configuration or code?

The context of why I’m trying to do this

The source dataset will introduce new records, updated records as well as deleted records in the incremental pulls. We need to be able to reflect all three operations in the target as well.

What I’ve already tried

I’ve already tried using the “merge” and “delete+insert” incremental strategies along with the is_incremental() macro with a date field based comparison to identify the delta records.

Some example code or error messages

{{
    config(
        materialized='incremental',
        unique_key='research_id',
        incremental_strategy='delete+insert'
    )
}}

with 
...,
TARGET
AS
(
    SELECT  ...)
 
SELECT distinct *,{{get_audit_col('SOURCE_SYSTEM_NAME','SOURCE_SYSTEM_KEY')}}
FROM TARGET

 {% if is_incremental() -%}

    where LAST_UPDATED_DT >= (select max(LAST_UPDATED_DT) from {{ this }})

{%- endif %}

joellabes · June 1, 2023, 4:43am

The delete+insert strategy reflect how updates are applied (instead of updating the row inline, it will delete against the PK and do a new insertion); dbt incremental models will never delete a row without an insertion to replace it.

What is your use case that you have to do hard deletes? Instead, you might need to have some way of marking those rows as obsolete, and then delete them in a post-hook.

The reason it works this way is that incremental models assume that they are append-only - the query returns only modified rows, so how would it know whether a missing row is meant to be deleted or is just unchanged?

hdsarmah · June 5, 2023, 11:58am

Thank you for the clarification @joellabes
I see the error in my understanding of the strategy tag. This now also clears few of the other doubts I had about the generated code .

I checked out the post_hook feature successfully.
I am wondering if we could assign some query result in a variable of sorts in a pre_hook and use that in the post_hook query. Is that possible?

joellabes · June 5, 2023, 7:46pm

Can you elaborate on what you’re trying to do with that?

pre hooks and post hooks have different contexts so you can’t share a variable value between them without storing it somewhere stateful (eg the database) in the meantime, but that’s hacky and there’s probably a better approach.

hdsarmah · June 6, 2023, 9:22am

I would use a “minus” query to compare and find out the missing PKs in the source dataset as opposed to the target. I would execute this as the pre_hook quey, and if I could store the resultset in a variable of sorts, then a deletion query would be executed as the post_hook. The deletion would be based like “delete from table where PK in ”.

rylan · February 16, 2024, 11:45am

Any updates to this thread? I am experiencing the same issue…

rylan · February 16, 2024, 1:59pm

I was able to solves this problem using a Custom Incremental Strategy for my materialization with the following methodology…

Create a macro that mimics the ‘delete+insert’ incremental strategy and call it something like ‘custom_delete_insert’
In that macro, use the ‘temp_relation’ to generate another table with the distinct fields you want to use to delete records in your ‘target_relation’ and put it somewhere else in your database. Call it something like ‘table_from_temp_relation_for_delete_target_relation’
Use a post_hook and reference the previously built ‘table_from_temp_relation_for_delete_target_relation’ to trim the result table from your invremental run. Remember to also add a DROP TABLE command in the post_hook so that you don’t have to store the table after your model builds.

Here’s sort of what my files looked like…

CONFIG in model:

{{
config(
    materialized='incremental',
    unique_key=['key1','key2','key3'],
    incremental_strategy='custom_delete_insert',
    post_hook= [
        """
        DELETE 
        FROM {{ this }} as t1
        WHERE NOT EXISTS (
            SELECT 1
            FROM 'table_from_temp_relation_for_delete_target_relation' as t2
            WHERE true
            AND t1.key1 = t2.key1
            )
        ;
        DROP TABLE 'table_from_temp_relation_for_delete_target_relation' 
        ;
        """
    ]
)
}}

MACRO to replace ‘delete+insert’:

-- macros/some_name.sql
{% macro get_incremental_custom_delete_insert_sql(arg_dict) %}
  {{ create_subtable_statement(arg_dict["temp_relation"]) }}
  {{ delete_statement(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"]) }}
  {{ insert_statement(arg_dict["target_relation"], arg_dict["temp_relation"]) }}
{% endmacro %}

{% macro create_subtable_statement(temp_relation) %}
{% set incremental_key = ['key1'] %}
  CREATE OR REPLACE TABLE table_from_temp_relation_for_delete_target_relation as (
    SELECT DISTINCT
    {% for key in incremental_key %}
      {{ key }} {% if not loop.last %},{% endif %}
    {% endfor %}
    FROM {{ temp_relation }}
  )
  ;
{% endmacro %}

{% macro delete_statement(target_relation, temp_relation, unique_keys) %}
  DELETE FROM {{ target_relation }}
  USING {{ temp_relation }}
  WHERE TRUE
      {% for key in unique_keys %} 
      and {{ target_relation }}.{{ key }} ={{ temp_relation }}.{{ key }}
      {% endfor %}
  ;
{% endmacro %}

{% macro insert_statement(target_relation, temp_relation) %}
insert into {{ target_relation }}
select * from {{ temp_relation }}
;
{% endmacro %}

Topic		Replies	Views
How to do incremental delete while using incremental model in dbt for snowflake table Archive	1	10865	May 17, 2022
Need help with the Incremental Help incremental	1	1130	December 18, 2023
Snowflake - Delete and Append Incremental Help incremental , snowflake , hooks , dbt-core	7	1104	July 6, 2024
Snowflake Streams with DBT In-Depth Discussions incremental , snowflake	7	9908	September 12, 2024
issue with snowflake merge into incremental_strategy Help snowflake , dbt-cloud	0	55	November 22, 2024

How to perform "delete" operations in Snowflake using "delete+insert" in incremental strategy and materialization?

The problem I’m having

The context of why I’m trying to do this

What I’ve already tried

Some example code or error messages

Related topics