I’m building snapshot models for reverse ETL scripts, so they pickup only recently changed rows. I don’t have any timestamp column available for this and I’m using “check all” strategy.
The problem is, two of my snapshot keep breaking with UPDATE/MERGE must match at most one source row for each target row error. When starting from scratch, they might work for 2-3 days and then break.
There are no duplicate rows in source data as I’m using QUALIFY to make sure only one row per unique key appears. This is the snapshot code:
{{
config(
materialized='snapshot',
target_schema='dbt_analytics',
unique_key='account_id',
strategy='check',
check_cols='all'
)
}}
select * from {{ ref("stg_accounts_standard_metrics_before_snapshot") }}
qualify row_number() over (partition by account_id order by account_id) = 1
(referenced model is also using qualify to make sure it outputs unique account_id)
I’ve noticed that for snapshot which don’t work, dbt leaves a table with __dbt_tmp
suffix which contains history of changes with following dbt values:
Not sure if these are correct.
I’m super confused because I have in total 4 snapshot tables created the same way (qualify clauses to ensure uniqueness), but only 2 are breaking.
The pipeline runs once a day, there is no option for race conditions and dbt running twice at the same time.
I’m fighting this for weeks now and I’m out of ideas. Anyone got some tips?