The problem I’m having
I’m trying to implement an incremental model in dbt for BigQuery. The goal is to filter new records by comparing the creation_time
to the maximum date value from the existing table. However, when I use the max()
function to get the maximum date from the table, the model fails with an error. If I hardcode a date (e.g., 31/12/2024
), the query works fine.
The context of why I’m trying to do this
I’m using this approach to avoid processing the entire dataset each time the incremental model runs. Instead, I want the model to only process records that are newer than the maximum date currently present in the table and within the next 3 days. This would optimize processing time and resource usage.
What I’ve already tried
- I’ve tried using the
max()
function to get the maximum date from the table and filter records in thewhere
clause. - I also tried manually setting the date range (e.g.,
31/12/2024
) and found that it works without any issues.
Some example code or error messages
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key='job_id',
partition_by={
'field': 'metric_date',
'data_type': 'date'
},
tags=["tag_bigquery_analytics"],
schema="staging"
) }}
with
info_schema as (
select
job_id
, user_email as user_id
, cast(creation_time as date) as metric_date
, job_type
, query
, cast(start_time as timestamp) as start_timestamp
, cast(end_time as timestamp) as end_timestamp
, referenced_tables[safe_offset(0)].dataset_id as referenced_dataset_id
, referenced_tables[safe_offset(0)].table_id as referenced_table_id
, destination_table.dataset_id as destination_dataset_id
, destination_table.table_id as destination_table_id
, total_slot_ms as total_slot_ms_processed
, total_bytes_processed
, total_bytes_billed
from {{ source('bigquery_info_schema', 'information_schema_jobs') }}
{% if is_incremental() %}
where cast(creation_time as date) > (select max(metric_date) from {{ this }})
and cast(creation_time as date) <= date_add((select max(metric_date) from {{ this }}), interval 3 day)
{% endif %}
)
select *
from info_schema