Snapshots are built with duplicate rows

I’m building snapshot models for reverse ETL scripts, so they pickup only recently changed rows. I don’t have any timestamp column available for this and I’m using “check all” strategy.

The problem is, two of my snapshot keep breaking with UPDATE/MERGE must match at most one source row for each target row error. When starting from scratch, they might work for 2-3 days and then break.

There are no duplicate rows in source data as I’m using QUALIFY to make sure only one row per unique key appears. This is the snapshot code:

{{
    config(
      materialized='snapshot',
      target_schema='dbt_analytics',
      unique_key='account_id',
      strategy='check',
      check_cols='all'
    )
}}

select * from {{ ref("stg_accounts_standard_metrics_before_snapshot") }}
qualify row_number() over (partition by account_id order by account_id) = 1

(referenced model is also using qualify to make sure it outputs unique account_id)

I’ve noticed that for snapshot which don’t work, dbt leaves a table with __dbt_tmp suffix which contains history of changes with following dbt values:

Not sure if these are correct.

I’m super confused because I have in total 4 snapshot tables created the same way (qualify clauses to ensure uniqueness), but only 2 are breaking.
The pipeline runs once a day, there is no option for race conditions and dbt running twice at the same time.

I’m fighting this for weeks now and I’m out of ideas. Anyone got some tips?

@patryk were you able to find a solution to this? I am running into the same problem now.

Nope. I created a workaround by building my own snapshot logic. I generate surrogate key from all columns and if the key changes, I update these rows in my table.

it seems i spoke too soon and just figured my problem out :sweat_smile:. my issue was due to a race condition between my prod and staging environments.

thanks for the reply!