Problem with DBT Merge

I am having a syntax issue with the following dbt merge code. I am new to dbt. this is not a type 2 dimention. My error is

Duplicate row detected during DML action
I looked up one of the test ids that has a duplicate and in the destination there was only one row for the test id . So some how my code is doing an insert when it should only have done an update. I am not sure if I am using this correctly. where testid in (select testid from {{ this }}) Thanks.

Some example code or error messages

{{
    config(
        materialized='incremental',
        unique_key='testid',
        incremental_strategy='merge'
    )
}}
with

cte_mgltestordermap as (

    select
        s.testid
        ,sysdate() recordlastupdated_utc
        ,s.ordernumber
        ,s.reportmaildate
        ,s.preciseflag
        , case
    when p.originalreportmaildate is null then s.reportmaildate
    else p.originalreportmaildate
  end as upd_originalreportmaildate
        , case
    when p.originalreportmaildate is null then s.ordernumber
    else p.ordernumber
  end as upd_ordernumber
        ,case
    when p.originalreportmaildate is null 
      then coalesce(existingids.orderid, p.orderid)
    else p.orderid
  end as upd_orderid
        ,case
    when p.preciseflag is null then s.preciseflag
    else p.preciseflag
  end as upd_preciseflag
  , case when coalesce(p.reportmaildate, '1970-01-01') <> coalesce(s.reportmaildate, '1970-01-01')
    or coalesce(p.preciseflag, 0) <> coalesce(s.preciseflag, 0) then 'update' else null 
      end upd_status     

    from {{ ref('int_table') }} s
    left outer join {{ this }} p  on p.testid = s.testid
    left outer join
(
  select orderid, ordernumber
  from {{ this }}
  group by orderid, ordernumber
) existingids on s.ordernumber = existingids.ordernumber


),

updates as (

    select
        testid 
        ,recordlastupdated_utc
        ,upd_orderid orderid
        ,upd_ordernumber ordernumber
        ,reportmaildate
        ,upd_originalreportmaildate  originalreportmaildate
        ,upd_preciseflag preciseflag
        

    from cte_mgltestordermap

    {% if is_incremental() %}

        where testid in (select testid from {{ this }})
  

    {% endif %}

),

inserts as (

    select
    testid 
    ,recordlastupdated_utc
    ,md5_binary(ordernumber) orderid
    ,ordernumber
    ,reportmaildate
    ,reportmaildate originalreportmaildate
    ,preciseflag
    
    from cte_mgltestordermap

    where testid not in (select testid from {{ this }})

)


select  
u.testid
,u.recordlastupdated_utc
,u.orderid
,u.ordernumber
,u.reportmaildate
,u.originalreportmaildate
,u.preciseflag


from updates u
union 
select 
i.testid
,i.recordlastupdated_utc::timestamp_NTZ
,i.orderid
,i.ordernumber
,i.reportmaildate
,i.originalreportmaildate
,i.preciseflag


from inserts i

Hi @msavage0507

The is_incremental macro can help you filtering only recent rows, for example

{% if is_incremental() %}
        where my_date > (select max(my_date) from {{ this }} )
{% endif %}

or

{% if is_incremental() %}
        where my_date = current_date()
{% endif %}

You can do it in several ways, I was just giving two examples

So you don’t have to process your whole table. It is not meant to compare keys.

The key comparison is handled by dbt. dbt generates the code that will compare the keys in your source table and in your destination table. You can see this code in your target/run/ folder

If it is detecting duplicated rows I can imagine two situations

  • Your unique key is not unique in destination
  • Your unique key is not unique in source (in your code’s transformation)

You said that the unique_key is really unique in your destination table

So what I can imagine is that in your SQL code, somewhere your unique_key gets duplicated

What I would do is to debug the code, see the output of each CTE and see where your unique_key gets duplicated