The problem I’m having
Hello,
I have an incremental model being implemented with Databricks, the data should be replaced using a column which is not unique.
I know the insert_overwrite
strategy with partition_by
should work, but I’m using Databricks SQLWarehouses, which doesn’t support insert_overwrite
.
I also tried to use replace_where
. This strategy replaces the data based on a boolean condition; however, in our use case, we would need to define the condition based on a subquery, which is not possible:
INSERT INTO <target_table>
REPLACE WHERE key IN (select key from <new_data>) // <- not possible to use a subquery
select * from <new_data>;
To expand on the use case, let’s assume a table and the date
column being used to update the table, the current state of the table is:
| date | value |
|--------------|-------|
| "2023-01-01" | a |
| "2023-01-02" | b1 |
| "2023-01-02" | b2 |
and the new data is:
| date | value |
|--------------|-------|
| "2023-01-02" | b3 |
The final result should be:
| date | value |
|--------------|-------|
| "2023-01-01" | a |
| "2023-01-02" | b3 |
Does anyone have any other ideas of what I could use to implement this model?
Thank you