If you are just diving into doing incrementals using Snowflake as your DB of choice, here is a quick learning that I recently had (thank you @jerco for the assist).
If you copy and paste the example code from the incremental models page , then you are likely to get this error message.
The feedback from dbt is that snowflake is struggling with the potential recursive relationship here of a field having the same name both inside and outside of that correlated aggregate function.
the solve → aliases
from :
where date_day >= (select max(date_day) from {{ this }})
to:
where *[main query table alias]*.date_day >= (select max(this.date_day) from {{ this }} as this)
I’m encountering this error exactly but having a bit of trouble implementing the solution. I’m newer to both incremental models and data modeling in general. Can you give an example with code for a full model?
Here’s my code:
{{ config (
materialized = 'incremental'
)
}}
with emails_raw as (
select * from {{ source("SOURCE_SCHEMA", "EMAILS") }}
)
select
email_id,
created_at,
updated_at,
email
from emails_raw
{% if is_incremental() %}
where emails_raw.created_at::date >= (select max(this.created_at::date) from {{ this }} as this)
{% endif %}
I know something is wrong but I’m not sure exactly what it needs
@Savage this looks right! Is this the full version of the code that was giving you an error, or is it a cut-down version? The only thing I can see that’s different is the casting to dates, but I would be surprised to discover that’s the problem.
Had the same issue as @Savage - then I realised this is because I had on_schema_change= ‘append_new_columns’ at the top in the config, and LOAD_DATE is something I added after making the alias change, so it couldnt find the column and I would get an error “invalid identifier ‘THIS.LOAD_DATE’”.
so if anyone has this issue, first do a --full-refresh --select and then try again…
I had the same issue and resolved it by doing this. It doesn’t like the MAX() aggregate, so just needed to use a different way.
where
(
{% if var("optional_start_date", default=None) is none %}
-- If optional_start_date is null, use the max date from the
-- existing model
last_modified > (
select last_modified
from {{ this }}
order by last_modified desc
limit 1
)
{% else %}
-- If optional_start_date is not null, use it as the lower
-- limit
last_modified >= '{{ var(' optional_start_date ') }}'::timestamp
{% endif %}
and {% if var("optonal_end_date", default=None) is none %}
-- If optonal_end_date is null, there is no upper limit
true
{% else %}
-- If optonal_end_date is not null, use it as the upper limit
last_modified < '{{ var(' optonal_end_date ') }}'::timestamp
{% endif %}
)
{% endif %}
Hello,
I am new to dbt applied to Snowflake, I feel like I am encountering a somewhat similar problem but what I do not understand for the moment is the link between the field of the incremental filter condition.
I have an error similar to the one in the topic name of this forum.
Subquery containing correlated aggregate function [MAX(EVENTS.TIMESTAMP)] can only appear in having or select clause
I feel more that my problem comes from ending my SQL query on a transformation of the field in question by encapsulating it in a DATE function for daily needs.
I will continue reading the documentation.
If it’s something obvious and you have the answer to, I’m all ears.
WITH session_events_aggregation AS (
SELECT
token,
timestamp,
[COUNT, SUM stuff …]
FROM {{ ref(‘events’) }}
WHERE type IN (‘extension:open’, ‘extension:close’)
{% if is_incremental() %} ⟶ I would like to put it here so it’s filtered as early as it should
AND timestamp >= (select max(timestamp) from {{ this }})
{% endif %}
ORDER BY timestamp
),
Other subquery,
SELECT
DATE(timestamp) as date
GROUP BY […]
I’ve tried
AND session_events_aggregation.timestamp >= (select max(this.timestamp) from {{ this }} as this)
Got invalid identifier ‘SESSION_EVENTS_AGGREGATION.TIMESTAMP’
But the query works with
AND timestamp >= (select max(timestamp) from {{ this }})
only if I remove the DATE(timestamp) in the last SELECT
I have the same problem as mentionned. error: Subquery containing correlated aggregate function [MAX(INTER_ORDERED_PRODUCT.ORDERDATE)] can only appear in having or select clause
I have tried the solution that the community propose but still i encounter an error.
Context i’m using snowflake and dbt cloud.
best,
here what i did:
{{ config(schema=‘Marts’) }}
{{ config(materialized=“incremental”) }}
with mart_ordered_product as (
select * from {{ref (“inter_ordered_product”)}}
),
final as (
select
orderedkey,
partkey,
suppkey,
linenumber,
quantity,
extendedprice,
discount,
tax,
returnflag,
linestatus,
shipdate,
commitdate,
receitdate,
shipinstruct,
shipmode,
li_comment,
customer_key,
orderstatus,
totalprice,
orderdate,
orderpriority,
clerk,
shipriority,
o_comment
from mart_ordered_product
{%if is_incremental() %}
where mart_ordered_product.orderdate >= (select max(this.orderdate) from {{this}} as this)
{% endif%}
)
I been experiencing the same issue and I found a simpler solution
This is my use case:
{{
config(
materialized="incremental",
unique_key="primary_key",
sort="column_c",
}}
with source as (
select * from {{ source('database', 'schema_name') }}
{{ limit_data_incremental('column_c') }}
),
transformed as (
select
column_a,
column_b,
column_c::datetime
from source
)
select *
from transformed
For simplicity, I have encapsulated the is_incremental() macro within limit_data_incremental. For more info about is_incremental() please visit this doc.
Aliasing the column after casting its data type resolved the issue for me, so I didn’t need to modify the macro.
transformed as (
select
column_a,
column_b,
column_c::datetime as column_c -- This is what I changed
from source
)