Databricks - insert_overwrite with SQLWarehouses

The problem I’m having

Hello,

I have an incremental model being implemented with Databricks, the data should be replaced using a column which is not unique.

I know the insert_overwrite strategy with partition_by should work, but I’m using Databricks SQLWarehouses, which doesn’t support insert_overwrite.

I also tried to use replace_where. This strategy replaces the data based on a boolean condition; however, in our use case, we would need to define the condition based on a subquery, which is not possible:

INSERT INTO <target_table>
REPLACE WHERE key IN (select key from <new_data>) // <- not possible to use a subquery
select * from <new_data>;

To expand on the use case, let’s assume a table and the date column being used to update the table, the current state of the table is:

| date         | value |
|--------------|-------|
| "2023-01-01" | a     |
| "2023-01-02" | b1    |
| "2023-01-02" | b2    |

and the new data is:

| date         | value |
|--------------|-------|
| "2023-01-02" | b3    |

The final result should be:

| date         | value |
|--------------|-------|
| "2023-01-01" | a     |
| "2023-01-02" | b3    |

Does anyone have any other ideas of what I could use to implement this model?

Thank you

Hi within databricks you can see if your use case can utilize the MERGE INTO statement

Hi @rvp13 unfortunately it doesn’t work for this use case because the column used to update the table is not unique.

Even if you user MERGE INTO ON FALSE?

Hi @brunoszdl, the merge into won’t work because the unique_key is not unique in this case.