incremental model compute costs are the same or more than creating a table

hello - trying to figure out why this model has the same or more compute costs when merging 0 rows as opposed to a full-refresh that builds it from scratch. seems pretty basic, not sure what i’m missing.

let’s say a full-refresh processes 5GB
if i run it as it is below after that full-refresh, it will merge 0 rows and process 10GB
if i run it w/ unique_key='id' removed after that full-refresh, it will merge 0 rows and process 5GB

it is true that the time for the runs goes down for the incremental runs.
“_last_updated” is a timestamp.

{{
    config(
        materialized='incremental',
        unique_key='id'
    )
}}

select 
    * 
from {{ref('my_table_name_here')}}
{% if is_incremental() %}
where _last_updated > (select max(_last_updated) from {{ this }})
{% endif %}

Hey @testing, I assume you’re using BigQuery since you’re talking about data scanned?

I’m not super familiar with how BQ bills, but I wonder whether it’s having to scan the whole table checking the _last_updated value for each column. Does BQ have indexes you can add or anything?

One way to work around this might be to run a separate query to check the value. Something like this:

{% if is_incremental() %}
{% set max_last_updated_input = dbt_utils.get_single_value("select max(_last_updated) from {{ ref('the_table') }}" %}
{% set max_last_updated_this = dbt_utils.get_single_value("select max(_last_updated) from {{ this }}" %}

{% if max_last_updated_input != max_last_updated_this %}
where _last_updated > '{{ max_last_updated_this }}'
{% endif %}
{% endif %}

(Written on mobile, not tested and not indented :grimacing:)

In BigQuery incremental model with the merge strategy scans the source table with specified filters and the whole target table to look for changes using uniquer_id. If your target table grows, that means that you scan more data each time to check for any changes to any row.

If you have a large partitioned target table, you can use incremental model with insert_overwrite strategy. In this case, dbt will not scan the full target table but rather drop particular partitions and insert new data in place.

1 Like