dbt then somehow generates two queries, one is to create a temp table and the other is to create the real table. The first query fires off perfectly fine with no error, but what is really weird is the second query:
create or replace table `db`.`dataset`.`my_table_forge`
partition by submit_date
OPTIONS(
require_partition_filter=True
)
as (
select col1, col2, col3...
from `db`.`dataset`.`my_table_forge`);
This hits an error:
Cannot query over table 'db.dataset.my_table_forge' without a filter over column(s) 'submit_date' that can be used for partition elimination
Why would dbt use the table to create itself? Shouldn’t it query from the temp table and dump data into the real table? And even if it makes sense, somehow, it is ignoring the partition_field submit_date.
Can you just make a quick test and try to run your model using dbt-bigquery==1.4.2 or dbt-bigquery==1.4.3?
If it does not work I am not sure what the problem is, but you can try to use a solution a community member (@johanndw) has come up with.
I will paste his message here:
"Related note, BQ does not perform partition elimination when using a subquery in the where clause. We use a macro to derive the highwater mark and pass that to the where clause as a literal value. This will probably also have solved your issue which requires a literal value, bonus is that BQ can perform partition elimination as well
{% macro get_incremental_filter(model, column, filter) %}
{% set max_query %}
select coalesce(max(timestamp( {{column}} )), timestamp('1900-01-01')) as max_timevalue
from {{ model }}
{% if filter is defined %}
where {{ filter }}
{% endif %}
{% endset %}
{% set max_query_results = run_query(max_query) %}
{% if execute %}
{% set max_timestamp = max_query_results.rows[0].values()[0] %}
{% if max_timestamp is none %}
{{ exceptions.raise_compiler_error('max_timestamp returned from "get_incremental_filter" macro is None') }}
{% endif %}
{% set max_timestamp = max_timestamp | string %}
{% else %}
{% set max_timestamp = '1901-01-01' %}
{% endif %}
{{ log('Incremental filter for ' ~ model ~ ' is ' ~ max_timestamp) }} {# log incremental filter #}
{{ return( "'" ~ max_timestamp ~ "'") }}
{% endmacro %}
And then in the models
from
bills
--Prod incremental behaviour
{% if target.name == 'prod' and is_incremental() %}
where bills.row_ingested_at > {{get_incremental_filter(this ,'row_ingested_at')}}
{% endif %}
Thanks a lot! Looks like we will need to upgrade to a more recent version and see what happens. I’m currently writing a script to generate dbt config block so prefer to not use the alternative solution as it looks complicated. Have a great weekend!