How to resolve "Subquery containing correlated aggregate function can only appear in having or select clause"

If you are just diving into doing incrementals using Snowflake as your DB of choice, here is a quick learning that I recently had (thank you @jerco for the assist).

If you copy and paste the example code from the incremental models page , then you are likely to get this error message.

The feedback from dbt is that snowflake is struggling with the potential recursive relationship here of a field having the same name both inside and outside of that correlated aggregate function.

the solve → aliases

from :

where date_day >= (select max(date_day) from {{ this }})

to:

where *[main query table alias]*.date_day >= (select max(this.date_day) from {{ this }} as this)

Best of Luck !

13 Likes

That’s a tricky one! Thanks for sharing :star2:

This saved me! Thank you!

I’m encountering this error exactly but having a bit of trouble implementing the solution. I’m newer to both incremental models and data modeling in general. Can you give an example with code for a full model?

Here’s my code:

{{ config (
      materialized = 'incremental'
  )
}}

with emails_raw as (
  select * from {{ source("SOURCE_SCHEMA", "EMAILS") }}
)

select  
  email_id, 
  created_at, 
  updated_at,
  email
from emails_raw 
{% if is_incremental() %}
  where emails_raw.created_at::date >= (select max(this.created_at::date) from {{ this }} as this) 
{% endif %}

I know something is wrong but I’m not sure exactly what it needs

1 Like

@Savage this looks right! Is this the full version of the code that was giving you an error, or is it a cut-down version? The only thing I can see that’s different is the casting to dates, but I would be surprised to discover that’s the problem.

Had the same issue as @Savage - then I realised this is because I had on_schema_change= ‘append_new_columns’ at the top in the config, and LOAD_DATE is something I added after making the alias change, so it couldnt find the column and I would get an error “invalid identifier ‘THIS.LOAD_DATE’”.
so if anyone has this issue, first do a --full-refresh --select and then try again…

Code

{{
config(
materialized = ‘incremental’,
tags = [‘PIPELINE_WEEKLY’],
on_schema_change= ‘append_new_columns’
)
}}

with final as (
SELECT *
FROM {{ref(‘decile_ranking’}}
)

select load_date, variable, rank
from final

{% if is_incremental() %}
WHERE FINAL.LOAD_DATE > (
SELECT MAX(THIS.LOAD_DATE)
FROM {{ this }} AS THIS
)
{% endif %}

I had the same issue and resolved it by doing this. It doesn’t like the MAX() aggregate, so just needed to use a different way.

            where
                (
                    {% if var("optional_start_date", default=None) is none %}
                        -- If optional_start_date is null, use the max date from the
                        -- existing model
                        last_modified > (
                            select last_modified
                            from {{ this }}
                            order by last_modified desc
                            limit 1
                        )

                    {% else %}
                        -- If optional_start_date is not null, use it as the lower
                        -- limit
                        last_modified >= '{{ var(' optional_start_date ') }}'::timestamp
                    {% endif %}
                    and {% if var("optonal_end_date", default=None) is none %}
                        -- If optonal_end_date is null, there is no upper limit
                        true
                    {% else %}
                        -- If optonal_end_date is not null, use it as the upper limit
                        last_modified < '{{ var(' optonal_end_date ') }}'::timestamp
                    {% endif %}
                )
        {% endif %}
1 Like

Hello,
I am new to dbt applied to Snowflake, I feel like I am encountering a somewhat similar problem but what I do not understand for the moment is the link between the field of the incremental filter condition.
I have an error similar to the one in the topic name of this forum.

Subquery containing correlated aggregate function [MAX(EVENTS.TIMESTAMP)] can only appear in having or select clause

I feel more that my problem comes from ending my SQL query on a transformation of the field in question by encapsulating it in a DATE function for daily needs.
I will continue reading the documentation.
If it’s something obvious and you have the answer to, I’m all ears.

{{
config(
materialized=‘incremental’,
unique_key=[‘date’, ‘token’]
)
}}

WITH session_events_aggregation AS (
SELECT
token,
timestamp,
[COUNT, SUM stuff …]
FROM {{ ref(‘events’) }}
WHERE type IN (‘extension:open’, ‘extension:close’)
{% if is_incremental() %} ⟶ I would like to put it here so it’s filtered as early as it should
AND timestamp >= (select max(timestamp) from {{ this }})
{% endif %}
ORDER BY timestamp
),

Other subquery,

SELECT
DATE(timestamp) as date

GROUP BY […]

I’ve tried

AND session_events_aggregation.timestamp >= (select max(this.timestamp) from {{ this }} as this)
Got invalid identifier ‘SESSION_EVENTS_AGGREGATION.TIMESTAMP’

But the query works with

AND timestamp >= (select max(timestamp) from {{ this }})

only if I remove the DATE(timestamp) in the last SELECT

Thanks for Sharing this.
Ananth

Hello guys,

I have the same problem as mentionned. error: Subquery containing correlated aggregate function [MAX(INTER_ORDERED_PRODUCT.ORDERDATE)] can only appear in having or select clause

I have tried the solution that the community propose but still i encounter an error.

Context i’m using snowflake and dbt cloud.

best,

here what i did:
{{ config(schema=‘Marts’) }}
{{ config(materialized=“incremental”) }}

with mart_ordered_product as (
select * from {{ref (“inter_ordered_product”)}}
),

final as (
select
orderedkey,
partkey,
suppkey,
linenumber,
quantity,
extendedprice,
discount,
tax,
returnflag,
linestatus,
shipdate,
commitdate,
receitdate,
shipinstruct,
shipmode,
li_comment,
customer_key,
orderstatus,
totalprice,
orderdate,
orderpriority,
clerk,
shipriority,
o_comment
from mart_ordered_product
{%if is_incremental() %}
where mart_ordered_product.orderdate >= (select max(this.orderdate) from {{this}} as this)
{% endif%}
)

select * from final

Two ways:

  • Use a alias to emails_raw, e.g. er : er.created_at::date >=
  • Remove table name prefix: created_at::date >=

Hi!

I been experiencing the same issue and I found a simpler solution :slight_smile:

This is my use case:

{{
    config(
        materialized="incremental",
        unique_key="primary_key",
        sort="column_c",
}}

with source as (
    select * from {{ source('database', 'schema_name') }}
    {{ limit_data_incremental('column_c') }}
),

transformed as (
    select 
        column_a,
        column_b,
        column_c::datetime
    from source
)

select *
from transformed

For simplicity, I have encapsulated the is_incremental() macro within limit_data_incremental. For more info about is_incremental() please visit this doc.

Aliasing the column after casting its data type resolved the issue for me, so I didn’t need to modify the macro.

transformed as (
    select 
        column_a,
        column_b,
        column_c::datetime as column_c -- This is what I changed
    from source
)

Hope this helps :grin: