The problem I’m having
Not a problem but more of we want to understand what’s happening. We noticed that dbt recreates the whole table and processed 41TB of data instead of just running it for a specific processing date.
What we noticed:
- dbt runs our models and create it under model_name__dbt_tmp
- dbt recreates the whole table by running; the table was already existing and had a lot of data. Because of this it processed 41TB of data
create or replace table model_name as (
select
col1,
col2,
from model_name
);
- dbt runs partition merges into the same table
Some settings we have;
incremental
matieralizationrequire_partition_filter = true
+on_schema_change: "sync_all_columns"
Question: Why is it doing step 2 on what I listed above?
Thank you!