The problem I’m having
I am trying to run an incremental model which refers to non-incremental views upstream and it seems to be always running a full refresh - hence taking a long time to run.
The context of why I’m trying to do this
I am using dbt to build a metric store, ie. we have a data warehouse setup and want to store pre-computed KPIs in a separate KPI model. The KPIs I am interested about are mainly immutable so I would very rarely need a full refresh. Instead I only want to compute them on new rows, I chose an incremental approach as the computation is quite expensive. However, since I am referring other non-incremental models upstream I seem to be getting no benefit of using an incremental model, the is_incremental() WHERE statement does not seem to work correctly as I am using a union all upstream (at least that is my guess).
What I’ve already tried
I already tried to include is_incremental() WHERE upstream and changed the upstream models from views to ephemeral, but I see in the compiled SQL that the WHERE statements do not populate in the compiled code.
Some example code or error messages
stg_sys1_orders.sql
SELECT customer_id, order_id, order_date, item_id, item_value FROM {{source("sys1", "orders")}}
stg_sys2_orders.sql
SELECT customer_id, order_id, order_date, item_id, item_value FROM {{source("sys2", "orders")}}```
fct_orders.sql
SELECT * FROM {{ ref('stg_sys1_orders') }}
UNION ALL
SELECT * FROM {{ ref('stg_sys2_orders') }}
kpi_customer_order_value.sql
{{ config(
materialized="incremental",
)}}
SELECT customer_id, order_id, sum(item_value) FROM {{ ref('fct_orders') }}
{% if is_incremental() %}
WHERE order_date >= SELECT(COALESCE(max(order_date),'1900-01-01') FROM {{ this }} )
{% endif %}
GROUP BY customer_id, order_id