Snapshots are built with duplicate rows

patryk · September 14, 2023, 7:20am

I’m building snapshot models for reverse ETL scripts, so they pickup only recently changed rows. I don’t have any timestamp column available for this and I’m using “check all” strategy.

The problem is, two of my snapshot keep breaking with UPDATE/MERGE must match at most one source row for each target row error. When starting from scratch, they might work for 2-3 days and then break.

There are no duplicate rows in source data as I’m using QUALIFY to make sure only one row per unique key appears. This is the snapshot code:

{{
    config(
      materialized='snapshot',
      target_schema='dbt_analytics',
      unique_key='account_id',
      strategy='check',
      check_cols='all'
    )
}}

select * from {{ ref("stg_accounts_standard_metrics_before_snapshot") }}
qualify row_number() over (partition by account_id order by account_id) = 1

(referenced model is also using qualify to make sure it outputs unique account_id)

I’ve noticed that for snapshot which don’t work, dbt leaves a table with __dbt_tmp suffix which contains history of changes with following dbt values:

Not sure if these are correct.

I’m super confused because I have in total 4 snapshot tables created the same way (qualify clauses to ensure uniqueness), but only 2 are breaking.
The pipeline runs once a day, there is no option for race conditions and dbt running twice at the same time.

I’m fighting this for weeks now and I’m out of ideas. Anyone got some tips?

stevepisani · February 26, 2024, 11:21pm

@patryk were you able to find a solution to this? I am running into the same problem now.

patryk · February 26, 2024, 11:44pm

Nope. I created a workaround by building my own snapshot logic. I generate surrogate key from all columns and if the key changes, I update these rows in my table.

stevepisani · February 26, 2024, 11:45pm

it seems i spoke too soon and just figured my problem out . my issue was due to a race condition between my prod and staging environments.

thanks for the reply!

bpruss · February 4, 2025, 10:44pm

Can you say more about how you solved the race condition? I’m having a similar issue where rows with no changes are getting new dbt_scd_id and timestamps, but there are NO DIFFERENCE between the records.

Topic		Replies	Views
Snapshots failing with Duplicate DBT_SCD_IDs Help snapshots , snowflake	1	1858	February 9, 2024
DBT Snapshot creating duplicate rows even though there's no change in source data Help snapshots , incremental , snowflake	6	3916	September 27, 2024
Snapshot duplicates issue Help snapshots , dbt-core	2	259	March 24, 2025
Snapshots rebuilt in every run Help snapshots	2	1556	February 12, 2024
DBT Snapshots not handling Multiple Updates to the Same Unique Key Help snapshots , dbt-core	0	123	September 13, 2024

Snapshots are built with duplicate rows

Related topics