The problem I’m having
I’m trying to implement an incremental model in dbt for BigQuery. The goal is to filter new records by comparing the creation_time to the maximum date value from the existing table. However, when I use the max() function to get the maximum date from the table, the model fails with an error. If I hardcode a date (e.g., 31/12/2024), the query works fine.
The context of why I’m trying to do this
I’m using this approach to avoid processing the entire dataset each time the incremental model runs. Instead, I want the model to only process records that are newer than the maximum date currently present in the table and within the next 3 days. This would optimize processing time and resource usage.
What I’ve already tried
- I’ve tried using the
max()function to get the maximum date from the table and filter records in thewhereclause. - I also tried manually setting the date range (e.g.,
31/12/2024) and found that it works without any issues.
Some example code or error messages
{{ config(
materialized='incremental',
incremental_strategy='merge',
unique_key='job_id',
partition_by={
'field': 'metric_date',
'data_type': 'date'
},
tags=["tag_bigquery_analytics"],
schema="staging"
) }}
with
info_schema as (
select
job_id
, user_email as user_id
, cast(creation_time as date) as metric_date
, job_type
, query
, cast(start_time as timestamp) as start_timestamp
, cast(end_time as timestamp) as end_timestamp
, referenced_tables[safe_offset(0)].dataset_id as referenced_dataset_id
, referenced_tables[safe_offset(0)].table_id as referenced_table_id
, destination_table.dataset_id as destination_dataset_id
, destination_table.table_id as destination_table_id
, total_slot_ms as total_slot_ms_processed
, total_bytes_processed
, total_bytes_billed
from {{ source('bigquery_info_schema', 'information_schema_jobs') }}
{% if is_incremental() %}
where cast(creation_time as date) > (select max(metric_date) from {{ this }})
and cast(creation_time as date) <= date_add((select max(metric_date) from {{ this }}), interval 3 day)
{% endif %}
)
select *
from info_schema