Duplicate Records on First run with Incremental Model on BigQuery

Hello dbt folks!

On one of the use case we have on our organisation, we have incremental tables which basically hold the append-only records of incoming events and current tables which stores the latest snapshots of incremental records with a unique key.

Above use-case looked us like an exact match to implement incremental model for good.

Documentation states that;

the first time a model is run, the table is built by transforming all rows of source data.

Because we have more than one record with the same unique key on our append-only incremental tables, the first incremental run generates more than one record with same unique key on current tables. Hence, consecutive batch following error;

UPDATE/MERGE must match at most one source row for each target row

Could anyone please let me know how this issue can be addressed with a solution or am I missing something?

Thanks in advance,
Soner

@guzeloglusoner
I ran into the same problem. Have you ever found a solution for this? If so, would you be able to share?

Thanks,

JB

Hello ,The model need to filter the duplicate records for eg: use of ROW NUMBER on unique key in SQL to filter the records as it’s first run and dbt don’t any records to do the compare and perform the activities .
in details explanation on below link

Hey @guzeloglusoner @jbagatelli

If your intention is to have multiple records with the same unique key, then you should create a new primary key (using dbt_utils.surrogate_key for example) and use that as the unique_key.

If your intention is to not have multiple records with that unique key, then you need to implement some deduplication logic as minhajpasha suggested.

Hope that helps!