Handling hard-deletes from source tables in Snapshots

ddaltonrtr · April 1, 2020, 6:06pm

Hi,

This is a great question and here’s the way we have gone about handling hard deletes:

Make a table with the view of your data (ex. schema.current_table), but add a pre-hook that creates a table based on what is currently in prod (ex. CREATE OR REPLACE TABLE schema.previous_table AS SELECT * FROM schema.current_table). The table created via the pre-hook will help you identify deletes.
Create a table that has the deleted records by using the MINUS operator (disclosure: I use Snowflake). The new table will be called something like schema.table_deleted_records.

SELECT
   column_a
  , column_b
FROM schema.table_previous

MINUS
SELECT
  column_a
  , column_b
FROM schema.table_current

Then in another table, union the data and add a column is_deleted (FALSE for current data, TRUE for deleted_records).

    SELECT
      column_a
     , column_b
    , TRUE AS is_deleted
    FROM schema.previous_table
    INNER JOIN schema.table_deleted_records
       ON << unique key >>

    UNION

    SELECT
     column_a
     , column_b
     , FALSE AS is_deleted
    FROM schema.current_table

Then, you will run your snapshots on the table created in Step 3 with the check strategy (we know that it is best to use a timestamp, so we do that elsewhere for tables that don’t have a hard delete).

This is just what has worked for us and is by no means the only way to go about it. For example, you could join based on the historical snapshot table as you mentioned. I think the is_deleted column is better than overwriting the dbt_valid_to column.

Topic		Replies	Views
Handling soft deletes in dbt snapshots Archive	5	4628	April 30, 2022
Snapshot behavior Help	1	1713	October 10, 2023
Help Understanding Snapshots invalidate_hard_deletes Help snapshots	2	3291	April 8, 2024
Snapshot check strategy with valid from date Archive	1	3080	November 11, 2021
Snapshots - set dbt_valid_from to created_at instead of updated_at for initial snapshot? Help snapshots , dbt-core	0	174	November 11, 2024

Handling hard-deletes from source tables in Snapshots

Related topics