unique_key config for snapshots

darcy · October 21, 2022, 10:05am

for the docs about unique_key config for snapshots , I think for the snapshot hudi table, the primaryKey should be ‘dbt_scd_id’, which is used in the ‘merge into on’ statement generated by ‘dbt snapshot’ command. And the dbt_scd_id column is related to the primary key of the source table and the update_at referenced column when we reference to timestamp strategy. But now the primaryKey of the snapshot CTAS hudi table, which is created as we run the first ‘dbt snapshot’ command, is the same with the primary key of the source table.In my experiment, I change
two parts of the sql generated by the dbt, one is the primaryKey of the snapshot CTAS hudi table using ‘dbt_scd_id’, the other one is the ‘insert [referenced columns]’ in the 'merge into insert ’ statement instead of ‘insert *’. After these changes, I get the scd2 table, stored in hudi

joellabes · October 27, 2022, 9:44am

Hi @darcy, is this feedback on the dbt documentation? If so, please open an issue on the developer hub repo: Issues · dbt-labs/docs.getdbt.com · GitHub

Topic		Replies	Views
dbt_scd_id creation logic in check strategy Help	3	277	March 12, 2025
How is dbt_scd_id calculated? Archive	2	8421	October 1, 2021
Snapshots are built with duplicate rows Help snapshots , bigquery	4	2682	February 4, 2025
Snapshots must be configured with a 'strategy', 'unique_key', and 'target_schema'. Help	3	1226	February 29, 2024
DBT Snapshot creating duplicate rows even though there's no change in source data Help snapshots , incremental , snowflake	6	3834	September 27, 2024

unique_key config for snapshots

Related topics