Re-open thread on insert only in incremental load

stefan.lindborg · December 9, 2022, 9:37am

With a unique key in the .yml file, the merge statement in an incremental model looks like this:

merge into `bigquery_table_tgt` as DBT_INTERNAL_DEST
    using (
      select * from `bigquery_table_tgt__dbt_tmp`
    ) as DBT_INTERNAL_SOURCE
    on 
                DBT_INTERNAL_SOURCE.key = DBT_INTERNAL_DEST.key

when matched then update set
    `key` = DBT_INTERNAL_SOURCE.`key` --etc 


when not matched then insert
    (columns etc)
values
    (values etc)

A user asked how to achieve insert only in an incremental model and the proposed solution was to remove the unique key in the .yml file.
Without a unique key in the .yml file, the merge statement looks like this:

merge into bigquery_table_tgt as DBT_INTERNAL_DEST
using (
select * from bigquery_table_tgt__dbt_tmp
) as DBT_INTERNAL_SOURCE
on FALSE

	#no join on source/target ->> you will have to check source against target yourself to avoid duplicates

when not matched then insert
    (columns etc)
values
    (values etc)

The solution I really would like to see is this

With a unique key in the .yml file:
#new feature in dbt:
merge_option: insert_only

merge into `bigquery_table_tgt` as DBT_INTERNAL_DEST
    using (
      select * from `bigquery_table_tgt__dbt_tmp`
    ) as DBT_INTERNAL_SOURCE
    on 
                DBT_INTERNAL_SOURCE.key = DBT_INTERNAL_DEST.key

# remove the following statement when merge_option = insert_only
#     when matched then update set `key` = DBT_INTERNAL_SOURCE.`key` --etc 
#but of course keep this:

when not matched then insert
    (columns etc)
values
    (values etc)

Will this kind of feature ever show up in dbt?

tonymo · April 2, 2023, 8:03pm

Thank you for your question! I ran into a similar issue (I want to create a historized table based on Full-Load (or Delta Load) coming regularly to an “inbound/staging” table).

They only solution I found (so far) was to use the “incremental_predicates” Incremental models | dbt Developer Hub

In the documentation, notice this sentence that is the key for this (hacky) solution: “dbt does not check the syntax of the SQL statements.”
When dbt see this incremental_predicates, it adds a AND statement in the ON (merge_condition) and put your additional statement without any further check. This is very useful for our problem because we can then hackily add:
DBT_INTERNAL_SOURCE.key = DBT_INTERNAL_DEST.key) OR (FALSE

dbt then compile it to this:
on (
DBT_INTERNAL_SOURCE.key = DBT_INTERNAL_DEST.key) OR (FALSE
) and (FALSE)

Which is what we were trying to implement.

Long term I obviously would love to remove this hack and have this insert_only statement too.

Topic		Replies	Views
Duplicate Records on First run with Incremental Model on BigQuery Archive	3	7435	November 29, 2021
incremental merge based on ID Help incremental , snowflake	1	318	August 10, 2024
Can we implement insert only with DBT incremental load Help incremental	3	6861	October 18, 2022
incremental model + unique constraint still allows duplicates Help incremental	4	616	December 11, 2024
does a full refresh enforce an incremental model unique key? Help incremental , snowflake	4	235	August 15, 2024

Re-open thread on insert only in incremental load

Related topics