I am trying to figure out how do costs are calculated correctly when using dbt partitions_to_replace vs _dbt_max_partition vs copy_partitions
Please note when I say costs i look it up under results for every query that is created by dbt and look at MIB processed.
Did tests and finding may vary of course. on a 44 GIB table:
- _dbt_max_partition
month_start_date >= date_sub(_dbt_max_partition, interval N day) logic costs overall 27.07 GIB to process which is quite a lot.
- partitions_to_replace
month_start_date in ({{ partitions_to_replace | join(‘,’) }}) - did the the exact same GIB and running time which is disappointing (im using the latest dbt version 1.6.1 -config and is_incremental below)
{{
config(
materialized='incremental',
partition_by = { 'field': 'created_date', 'data_type': 'date',"granularity": "month" },
cluster_by = ["first_line_item","is_first_order","customer_id","transaction_id"],
incremental_strategy = 'insert_overwrite',
on_schema_change='sync_all_columns',
partitions = partitions_to_replace
)
}}
{% if is_incremental() %}
where month_start_date in ({{ partitions_to_replace | join(',') }})
{% endif %}
- copy_partitions
It ran twice the amount of time and took twice the amount of $.
Questions regarding the findings
a. Do I look at costs correctly?
b. why does the partitions_to_replace vs _dbt_max_partition takes the exact same time?
c. why does the copy_partitions runs twice the query and twice the COPY, is this expected??
d. copy_partitions runs twice the time - is it due to the twice running logic above or because the copy for each partition is done sequential?
e. did I do correct the config above for partitions_to_replace?
Any help would be appreciated