Parse Redshift incremental model with unique_key to Databricks

renan.costa · June 23, 2023, 4:27pm

The problem I’m having

Hey all. I’m working to parse some Redshift models to run with Databricks(Delta format). The current configuration of the models is:

Redshift:

config:
    materialized: incremental
    unique_key: date

For Databricks, I can use:

config: 
    materialized: incremental
    unique_key: date
    incremental_strategy: merge

However, for some of these models, the unique_key column is not unique. It works fine as the Redshift adapter performs a delete + insert. However, the Databricks adapter performs a merge operation; therefore, I cannot use this configuration for these tables.

I’m wondering if there’s a way to perform the delete as part of the dbt run, before the merge. Or if I should go with Creating new materializations | dbt Developer Hub

I appreciate any thoughts. Thank you.

joellabes · June 26, 2023, 5:09am

Is there any way to make your unique_key column unique? The documentation is pretty firm that your unique keys need to be unique, so if that’s not the case you’ll be in for a bad time as you’re discovering.

Is this intentional, or is it a hangover from a past modelling error? If the latter, is it possible to do a one-off cleanup outside of dbt (or inside, using something like a row_number and filtering to just the first version)?

renan.costa · June 27, 2023, 9:57am

Thank you for your answer @joellabes.

Is there any way to make your unique_key column unique?

Is this intentional, or is it a hangover from a past modelling error?

I don’t think it’s possible to make it unique. This was intentional.

I will try to explain a common use case. Let’s say we have the incremental model and the current state of the table is:

key	value
1	‘a1’
1	‘a2’
2	‘b1’
2	‘b2’

And the new selected data is:

key	value
2	‘b3’

The expected table’s state should be:

key	value
1	‘a1’
1	‘a2’
2	‘b3’

I’m wondering if there’s another strategy to do the same.

I checked the docs and it looks like delete+insert is supposed to do this (?)

Even if we could find an alternative strategy for Redshift, the Databricks adapter doesn’t have an incremental_strategy that deletes all the keys before adding the new values.

I’m now exploring https://docs.getdbt.com/guides/advanced/creating-new-materializations

Topic		Replies	Views
incremental model + unique constraint still allows duplicates Help incremental	4	644	December 11, 2024
incremental tables with slowly changing data Help incremental , redshift , dbt-core	3	2376	June 19, 2023
Clarification required in dbt docs Help incremental	0	726	October 31, 2023
Use more than one key as incremental unique_key Help incremental	3	19434	September 9, 2022
does a full refresh enforce an incremental model unique key? Help incremental , snowflake	4	245	August 15, 2024

Parse Redshift incremental model with unique_key to Databricks

The problem I’m having

Related topics