How to perform "delete" operations in Snowflake using "delete+insert" in incremental strategy and materialization?

The problem I’m having

I’ve been trying to write up a model that will handle all the insert, update and delete of data in Snowflake. I’ve used the incremental strategy as “delete+insert” and added a date based condition in the “is_incremental()” so that the records in the source with value for the check column greater than the current max value in target will be considered as the “incremental/delta” dataset.

After multiple runs, I can see that the Insert and Update operations are happening as expected, however, the deletion of the records are not happening.

Can anyone tell me what it is I am missing in the configuration or code?

The context of why I’m trying to do this

The source dataset will introduce new records, updated records as well as deleted records in the incremental pulls. We need to be able to reflect all three operations in the target as well.

What I’ve already tried

I’ve already tried using the “merge” and “delete+insert” incremental strategies along with the is_incremental() macro with a date field based comparison to identify the delta records.

Some example code or error messages

{{
    config(
        materialized='incremental',
        unique_key='research_id',
        incremental_strategy='delete+insert'
    )
}}

with 
...,
TARGET
AS
(
    SELECT  ...)
 
SELECT distinct *,{{get_audit_col('SOURCE_SYSTEM_NAME','SOURCE_SYSTEM_KEY')}}
FROM TARGET

 {% if is_incremental() -%}

    where LAST_UPDATED_DT >= (select max(LAST_UPDATED_DT) from {{ this }})

{%- endif %}

The delete+insert strategy reflect how updates are applied (instead of updating the row inline, it will delete against the PK and do a new insertion); dbt incremental models will never delete a row without an insertion to replace it.

What is your use case that you have to do hard deletes? Instead, you might need to have some way of marking those rows as obsolete, and then delete them in a post-hook.

The reason it works this way is that incremental models assume that they are append-only - the query returns only modified rows, so how would it know whether a missing row is meant to be deleted or is just unchanged?

Thank you for the clarification @joellabes
I see the error in my understanding of the strategy tag. This now also clears few of the other doubts I had about the generated code .

I checked out the post_hook feature successfully.
I am wondering if we could assign some query result in a variable of sorts in a pre_hook and use that in the post_hook query. Is that possible?

1 Like

Can you elaborate on what you’re trying to do with that?

pre hooks and post hooks have different contexts so you can’t share a variable value between them without storing it somewhere stateful (eg the database) in the meantime, but that’s hacky and there’s probably a better approach.

I would use a “minus” query to compare and find out the missing PKs in the source dataset as opposed to the target. I would execute this as the pre_hook quey, and if I could store the resultset in a variable of sorts, then a deletion query would be executed as the post_hook. The deletion would be based like “delete from table where PK in ”.