How to perform "delete" operations in Snowflake using "delete+insert" in incremental strategy and materialization?

The problem I’m having

I’ve been trying to write up a model that will handle all the insert, update and delete of data in Snowflake. I’ve used the incremental strategy as “delete+insert” and added a date based condition in the “is_incremental()” so that the records in the source with value for the check column greater than the current max value in target will be considered as the “incremental/delta” dataset.

After multiple runs, I can see that the Insert and Update operations are happening as expected, however, the deletion of the records are not happening.

Can anyone tell me what it is I am missing in the configuration or code?

The context of why I’m trying to do this

The source dataset will introduce new records, updated records as well as deleted records in the incremental pulls. We need to be able to reflect all three operations in the target as well.

What I’ve already tried

I’ve already tried using the “merge” and “delete+insert” incremental strategies along with the is_incremental() macro with a date field based comparison to identify the delta records.

Some example code or error messages

{{
    config(
        materialized='incremental',
        unique_key='research_id',
        incremental_strategy='delete+insert'
    )
}}

with 
...,
TARGET
AS
(
    SELECT  ...)
 
SELECT distinct *,{{get_audit_col('SOURCE_SYSTEM_NAME','SOURCE_SYSTEM_KEY')}}
FROM TARGET

 {% if is_incremental() -%}

    where LAST_UPDATED_DT >= (select max(LAST_UPDATED_DT) from {{ this }})

{%- endif %}

1 Like

The delete+insert strategy reflect how updates are applied (instead of updating the row inline, it will delete against the PK and do a new insertion); dbt incremental models will never delete a row without an insertion to replace it.

What is your use case that you have to do hard deletes? Instead, you might need to have some way of marking those rows as obsolete, and then delete them in a post-hook.

The reason it works this way is that incremental models assume that they are append-only - the query returns only modified rows, so how would it know whether a missing row is meant to be deleted or is just unchanged?

Thank you for the clarification @joellabes
I see the error in my understanding of the strategy tag. This now also clears few of the other doubts I had about the generated code .

I checked out the post_hook feature successfully.
I am wondering if we could assign some query result in a variable of sorts in a pre_hook and use that in the post_hook query. Is that possible?

1 Like

Can you elaborate on what you’re trying to do with that?

pre hooks and post hooks have different contexts so you can’t share a variable value between them without storing it somewhere stateful (eg the database) in the meantime, but that’s hacky and there’s probably a better approach.

I would use a “minus” query to compare and find out the missing PKs in the source dataset as opposed to the target. I would execute this as the pre_hook quey, and if I could store the resultset in a variable of sorts, then a deletion query would be executed as the post_hook. The deletion would be based like “delete from table where PK in ”.

Any updates to this thread? I am experiencing the same issue…

I was able to solves this problem using a Custom Incremental Strategy for my materialization with the following methodology…

  1. Create a macro that mimics the ‘delete+insert’ incremental strategy and call it something like ‘custom_delete_insert’
  2. In that macro, use the ‘temp_relation’ to generate another table with the distinct fields you want to use to delete records in your ‘target_relation’ and put it somewhere else in your database. Call it something like ‘table_from_temp_relation_for_delete_target_relation’
  3. Use a post_hook and reference the previously built ‘table_from_temp_relation_for_delete_target_relation’ to trim the result table from your invremental run. Remember to also add a DROP TABLE command in the post_hook so that you don’t have to store the table after your model builds.

Here’s sort of what my files looked like…

CONFIG in model:

{{
config(
    materialized='incremental',
    unique_key=['key1','key2','key3'],
    incremental_strategy='custom_delete_insert',
    post_hook= [
        """
        DELETE 
        FROM {{ this }} as t1
        WHERE NOT EXISTS (
            SELECT 1
            FROM 'table_from_temp_relation_for_delete_target_relation' as t2
            WHERE true
            AND t1.key1 = t2.key1
            )
        ;
        DROP TABLE 'table_from_temp_relation_for_delete_target_relation' 
        ;
        """
    ]
)
}}

MACRO to replace ‘delete+insert’:

-- macros/some_name.sql
{% macro get_incremental_custom_delete_insert_sql(arg_dict) %}
  {{ create_subtable_statement(arg_dict["temp_relation"]) }}
  {{ delete_statement(arg_dict["target_relation"], arg_dict["temp_relation"], arg_dict["unique_key"]) }}
  {{ insert_statement(arg_dict["target_relation"], arg_dict["temp_relation"]) }}
{% endmacro %}

{% macro create_subtable_statement(temp_relation) %}
{% set incremental_key = ['key1'] %}
  CREATE OR REPLACE TABLE table_from_temp_relation_for_delete_target_relation as (
    SELECT DISTINCT
    {% for key in incremental_key %}
      {{ key }} {% if not loop.last %},{% endif %}
    {% endfor %}
    FROM {{ temp_relation }}
  )
  ;
{% endmacro %}

{% macro delete_statement(target_relation, temp_relation, unique_keys) %}
  DELETE FROM {{ target_relation }}
  USING {{ temp_relation }}
  WHERE TRUE
      {% for key in unique_keys %} 
      and {{ target_relation }}.{{ key }} ={{ temp_relation }}.{{ key }}
      {% endfor %}
  ;
{% endmacro %}

{% macro insert_statement(target_relation, temp_relation) %}
insert into {{ target_relation }}
select * from {{ temp_relation }}
;
{% endmacro %}