incremental tables with slowly changing data

The problem I’m having

We have a number of tables that are rather large and comples and are using incremental table builds to help smooth out potential performance issues. This has been relatively straight forward to implment but we hit a bit of a blocker. Our environment consists of dev and prod. Dev data is rather sparse and prod data is a moderate volume. Source data largely comes in via nightly syncs with apis and other databases into our redshift cluster. The problem we’ve hit is that if the incremental window doesn’t find any new data, the run will fail with a “aggregates not allowed in WHERE clause”. I’d rather not have this happen in dev but could stomach it if that was the only option. However, since prod only gets data nightly, subsequent runs in prod will fail which makes me worry much more since there may be a point where we would need to push out a new model mid day and rerun our dbt jobs to get the new data in place.

An example of one of these models is

{{
    config(
        materialized='incremental',
        unique_key = ['auth0_sub'],
        sort=['auth0_sub']
    )
}}

WITH riders AS (
    SELECT
        n.nexus_user_id,
        auth0_sub,
        n.is_active,
        n.is_staff,
        i.country_name AS country,
        i.city,
        i.state,
        i.lat,
        i.long,
        n.axs_user_create_ts,
        n.axs_user_update_ts
    FROM
        {{ ref('stg_nexus__users') }} AS n
    LEFT JOIN
        {{ ref('stg_iterable__users') }} AS i USING (auth0_sub)

    {% if is_incremental() %}

        WHERE n.axs_user_update_ts > (
            SELECT MAX(n.axs_user_update_ts)
            FROM {{ this }}
        )

    {% endif %}
)

SELECT *
FROM riders

The context of why I’m trying to do this

incremental modeling for data coming from nightly syncs

What I’ve already tried

I’ve looked through some discussion threads but am trying to understand my options.

Some example code or error messages

22:14:00  Finished running 1 incremental model, 2 tests in 0 hours 0 minutes and 10.66 seconds (10.66s).
22:14:00  
22:14:00  Completed with 1 error and 0 warnings:
22:14:00  
22:14:00  Database Error in model mart__riders (models/3_marts/general_use/mart__riders.sql)
22:14:00    aggregates not allowed in WHERE clause

Shouldn’t it be

WHERE n.axs_user_update_ts > (
    SELECT MAX(axs_user_update_ts)
    FROM {{ this }}
)

instead of

WHERE n.axs_user_update_ts > (
    SELECT MAX(n.axs_user_update_ts)
    FROM {{ this }}
)

?

I think we ended up there because sqlfluff was throwing errors on that formulation. We ended up solving it using aliases

{{
    config(
        materialized='incremental',
        unique_key = ['auth0_sub'],
        sort=['auth0_sub']
    )
}}

WITH riders AS (
    SELECT
        n.nexus_user_id,
        auth0_sub,
        n.is_active,
        n.is_staff,
        i.country_name AS country,
        i.city,
        i.state, --TODO do we want to rename this to region?
        i.lat,
        i.long,
        n.axs_user_create_ts,
        n.axs_user_update_ts
    FROM
        {{ ref('stg_nexus__users') }} AS n
    LEFT JOIN
        {{ ref('stg_iterable__users') }} AS i USING (auth0_sub)

    {% if is_incremental() %}

        WHERE n.axs_user_update_ts > (
            SELECT MAX(this.axs_user_update_ts)
            FROM {{ this }} AS this
        )

    {% endif %}
)

SELECT *
FROM riders
1 Like

Hi @cperry, so problem solved?