Hi,
Is there a way to use merge incremental strategy without data getting updated if the Unique id is already available in the destination. The merge incremental strategy inserts records if there is no matching id in destination and updates the columns if the ID is already present in the destination. I want to ignore the incoming records if the unique ID is already available in the destination and insert only new records.
dbt will only insert records that are returned by your query. Inside of your is_incremental
block, if you use a not exists
clause to ignore any records whose unique ID is already in the destination, then it will only return the new records which means they’re the only ones that will be inserted.
create your own incremental strategy and configure the strategy to ur model using config function
you create the incremental strategy macro and write your merge logic.
dbt identifies the macro with the name get_incremental_{strategy}_sql
so you have to create macro with the above name.
dbt internally passes the below arguments to the user defined incremental strategy macro
{'target_relation': target_relation, 'temp_relation': tmp_relation, 'unique_key': unique_key, 'dest_columns': dest_columns, 'predicates': incremental_predicates }
you have to return a merge sql from the macro, dbt runs the sql on the configured database
user defined incremental strategy:
{% macro get_incremental_{strategy}_sql(arg_dict) %}
merge logic
{{ return (merge_sql_query) }}
{% endmacro %}
This is true, but is probably overkill given that the default merge strategy will work out of the box if it’s given the correct rows to work with.
Thank you for you reply on the incremental model. I am looking for a way to do an incremental model if the there is a change in data(Changing attributes). eg. If ID 123 is loaded in model already and if the id 123 is flowing in again in the next run, i need to check if there is change in data (other fields like rates ,amount ,name etc.) and update it. if not I need to ignore the upsert for that ID. Is there is way to implement this with existing dbt feature or do we need to form a query to support this?
use an audit_field(timestamp) in your source table which tells us when the record is updated or inserted. Use this audit field in ur incremental logic to filter updated/inserted records.
Example:-
select
col_a,
col_b,
col_c
from raw_app_data.events
{% if is_incremental() %}
-- this filter will only be applied on an incremental run
where event_time > (select max(event_time) from {{ this }})
{% endif %}
here raw_app_data.events is the source model and it has a audit field event_time
This is exactly right - dbt assumes that incremental models have some way of identifying which records have changed. Ideally you can use the modified_date or something on the source table.
Otherwise you would need to check each column whose values you care about inside of your is_incremental()
block
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.