DBT BigQuery table creation code generation issue

markusz · March 31, 2023, 3:28pm

Hi experts,

I have been using dbt for BigQuery table creation and transformation but it has been constantly giving me query errors that it generates.

Here is what the model looks like, this is to imitate BQ’s WRITE_TRUNCATE on partition column submit_date.

{{
    config(
        materialized='incremental',
        incremental_strategy='insert_overwrite',
        on_schema_change='sync_all_columns',
        partition_by = {
            'field': 'submit_date',
            'data_type': 'date',
            'granularity': 'day'
        },
        require_partition_filter = true,
        database='db',
        schema='dataset',
        alias='my_table_forge'
    )
}}
-- An ordinary SELECT

dbt then somehow generates two queries, one is to create a temp table and the other is to create the real table. The first query fires off perfectly fine with no error, but what is really weird is the second query:

create or replace table `db`.`dataset`.`my_table_forge`
partition by submit_date

OPTIONS(
require_partition_filter=True
)
as (

select col1, col2, col3...
from `db`.`dataset`.`my_table_forge`);

This hits an error:

Cannot query over table 'db.dataset.my_table_forge' without a filter over column(s) 'submit_date' that can be used for partition elimination

Why would dbt use the table to create itself? Shouldn’t it query from the temp table and dump data into the real table? And even if it makes sense, somehow, it is ignoring the partition_field submit_date.

Any idea how to fix it?

brunoszdl · March 31, 2023, 3:58pm

This seems a bug from dbt 1.4, which version are you using?

It is already fixed from what I know.

You can try upgrading your dbt-bigquery to dbt-bigquery==1.4.3 or downgrading to dbt-bigquery==1.3.2

markusz · March 31, 2023, 4:11pm

Thanks @brunoszdl for the quick reply. I’ll speak to the team and get back to you.

markusz · March 31, 2023, 4:22pm

@brunoszdl We are using dbt-bigquery 1.3.0, do you think it could be another issue?

brunoszdl · March 31, 2023, 5:15pm

Hmmm that’s weird.

It seems pretty much this issue here:

github.com/dbt-labs/dbt-bigquery

[CT-1912] [Regression] `require_partition_filter` usage with `insert_overwrite` fails on second run

opened 09:51AM - 26 Jan 23 UTC

closed 09:43PM - 02 Mar 23 UTC

github-christophe-oudar

bug regression

### Is this a regression in a recent version of dbt-bigquery? - [X] I believe… this is a regression in dbt-bigquery functionality - [X] I have searched the existing issues, and I could not find an existing issue for this regression ### Current Behavior Let's take this simple example: ``` {{ config( materialized = 'incremental', incremental_strategy='insert_overwrite', partition_by = { "field": "hour", "data_type": "timestamp", "granularity": "hour" }, require_partition_filter = true, ) }} SELECT TIMESTAMP('2023-01-25') as hour, 1 as value ``` If you run it twice it will fail with `"Query error: Cannot query over table '<project>.<dataset>.test_bug' without a filter over column(s) 'hour' that can be used for partition elimination at [39:5]"` ### Expected/Previous Behavior The query should be successful. ### Steps To Reproduce Using `dbt-bigquery 1.4.0rc1 (or any 1.4.0)` Why it fails? Because the generated code is the following ``` ... when not matched by source and timestamp(timestamp_trunc(DBT_INTERNAL_DEST.hour, hour)) in unnest(dbt_partitions_for_replacement) then delete ... ``` instead of ``` ... when not matched by source and timestamp_trunc(DBT_INTERNAL_DEST.hour, hour) in unnest(dbt_partitions_for_replacement) then delete ... ``` The reason is that BigQuery can't figure out `timestamp(timestamp_trunc(DBT_INTERNAL_DEST.hour, hour))` is the same as `timestamp_trunc(DBT_INTERNAL_DEST.hour, hour)` which means it can't forward the origin of the field (`hour`) to the resulting recreated field. Though we could expect the query planner to do job of inline (and prune) the timestamp cast... it's not the case so we'll have to be smarter when we generate the query to not include it. ### Workaround Use `"copy_partitions": true` setting such as ``` partition_by = { "field": "hour", "data_type": "timestamp", "granularity": "hour", "copy_partitions": true } ``` in ### Relevant log output _No response_ ### Environment ```markdown - OS:Mac 12.6 - Python: 3.10.9 - dbt-core (working version):1.3.2 - dbt-bigquery (working version): 1.3.0 - dbt-core (regression version): 1.4.0-X - dbt-bigquery (regression version): 1.4.0-X ``` ### Additional Context _No response_

That was corrected here:

Can you just make a quick test and try to run your model using dbt-bigquery==1.4.2 or dbt-bigquery==1.4.3?

If it does not work I am not sure what the problem is, but you can try to use a solution a community member (@johanndw) has come up with.

I will paste his message here:

"Related note, BQ does not perform partition elimination when using a subquery in the where clause. We use a macro to derive the highwater mark and pass that to the where clause as a literal value. This will probably also have solved your issue which requires a literal value, bonus is that BQ can perform partition elimination as well

{% macro get_incremental_filter(model, column, filter) %}

{% set max_query %}
    select coalesce(max(timestamp( {{column}} )), timestamp('1900-01-01')) as max_timevalue 
    from {{ model }}
    {% if filter is defined %}
        where {{ filter }}
    {% endif %}
{% endset %}

{% set max_query_results = run_query(max_query) %}

{% if execute %}
    {% set max_timestamp = max_query_results.rows[0].values()[0] %}
    {% if max_timestamp is none %}
        {{ exceptions.raise_compiler_error('max_timestamp returned from "get_incremental_filter" macro is None') }}
    {% endif %}
    {% set max_timestamp = max_timestamp | string %}
{% else %}
    {% set max_timestamp = '1901-01-01' %}
{% endif %}

{{ log('Incremental filter for ' ~ model ~ ' is ' ~ max_timestamp) }} {# log incremental filter #}
{{ return( "'" ~ max_timestamp ~ "'") }}

{% endmacro %}

And then in the models

from 
   bills

    --Prod incremental behaviour
    {% if target.name == 'prod' and is_incremental() %}      
        where bills.row_ingested_at > {{get_incremental_filter(this ,'row_ingested_at')}}
    {% endif %}

"

brunoszdl · March 31, 2023, 5:18pm

And also kudos to Christophe Oudar (didn’t find his profile here) for creating the issue and the PR

markusz · March 31, 2023, 10:21pm

Thanks a lot! Looks like we will need to upgrade to a more recent version and see what happens. I’m currently writing a script to generate dbt config block so prefer to not use the alternative solution as it looks complicated. Have a great weekend!

brunoszdl · April 1, 2023, 11:48pm

When you test it, let me know if it worked!

Topic		Replies	Views
dbt incremental model on BQ Partitioned table Help jinja , incremental , bigquery	1	3638	July 27, 2023
Error: dbt incremental model on BQ Partitioned table Help incremental , bigquery , dbt-core	3	630	August 22, 2024
How to dynamically generate input to `partitions` config in BigQuery Archive	1	5533	September 18, 2019
BigQuery + dbt: Incremental Changes Archive	8	39996	June 19, 2020
BigQuery ingestion-time partitioning and partition copy with dbt In-Depth Discussions devblog	6	2693	September 21, 2024

DBT BigQuery table creation code generation issue

Related topics