incremental model compute costs are the same or more than creating a table

testing · January 3, 2023, 11:41pm

hello - trying to figure out why this model has the same or more compute costs when merging 0 rows as opposed to a full-refresh that builds it from scratch. seems pretty basic, not sure what i’m missing.

let’s say a full-refresh processes 5GB
if i run it as it is below after that full-refresh, it will merge 0 rows and process 10GB
if i run it w/ unique_key='id' removed after that full-refresh, it will merge 0 rows and process 5GB

it is true that the time for the runs goes down for the incremental runs.
“_last_updated” is a timestamp.

{{
    config(
        materialized='incremental',
        unique_key='id'
    )
}}

select 
    * 
from {{ref('my_table_name_here')}}
{% if is_incremental() %}
where _last_updated > (select max(_last_updated) from {{ this }})
{% endif %}

joellabes · January 20, 2023, 5:55am

Hey @testing, I assume you’re using BigQuery since you’re talking about data scanned?

I’m not super familiar with how BQ bills, but I wonder whether it’s having to scan the whole table checking the _last_updated value for each column. Does BQ have indexes you can add or anything?

One way to work around this might be to run a separate query to check the value. Something like this:

{% if is_incremental() %}
{% set max_last_updated_input = dbt_utils.get_single_value("select max(_last_updated) from {{ ref('the_table') }}" %}
{% set max_last_updated_this = dbt_utils.get_single_value("select max(_last_updated) from {{ this }}" %}

{% if max_last_updated_input != max_last_updated_this %}
where _last_updated > '{{ max_last_updated_this }}'
{% endif %}
{% endif %}

(Written on mobile, not tested and not indented )

wefo · January 24, 2023, 9:22am

In BigQuery incremental model with the merge strategy scans the source table with specified filters and the whole target table to look for changes using uniquer_id. If your target table grows, that means that you scan more data each time to check for any changes to any row.

If you have a large partitioned target table, you can use incremental model with insert_overwrite strategy. In this case, dbt will not scan the full target table but rather drop particular partitions and insert new data in place.

Topic		Replies	Views
Incremental model implementation processing more in BigQuery Help incremental , bigquery	5	1752	September 5, 2023
Increasing processing query cost with Incrementals Models Help incremental , bigquery	0	784	July 20, 2023
DBT BigQuery Scanning Entire table in merge Help bigquery	1	1959	May 10, 2023
BigQuery + dbt: Incremental Changes Archive	8	39948	June 19, 2020
incremental merge based on ID Help incremental , snowflake	1	340	August 10, 2024

incremental model compute costs are the same or more than creating a table

Related topics