How to configure dbt to only incrementally process values it has not processed before?

I want to author a model in dbt that processes each row exactly once. A sample use case is where I have a function expensive_api_call() that I want to run per row, but since the results of this API call will never change I want to ensure that I’m only calling it once per row, rather than once per materialisation. Intuitively, I want to find which rows haven’t yet been processed, process them, and store the result in the model.

I figured that an incremental model would be the appropriate way to do this, with something like my_model.sql below:

{{ config(materialized='incremental', unique_key='my_id') }}

WITH missing_mappings AS (
    SELECT my_id FROM {{ ref('my_source_table') }}
    EXCEPT (SELECT my_id FROM {{ ref('my_model') }})
)
SELECT my_id, expensive_api_call(my_id) AS api_call_result
FROM missing_mappings

While the syntax of the code snippet above may not be exactly right, I need to resolve the modelling issue first: dbt (correctly) detects a cycle because the model is referring to itself.

I don’t have an intrinsic way of detecting which my_id values are new and require the API call to be made (e.g. with a timestamp column in the upstream table), other than by inspecting which IDs have been previously processed in my_model.

In case it’s relevant, my data warehouse is Snowflake and expensive_api_call() is a stored procedure in Snowflake.

How can I achieve what I want with dbt?

I’ve worked out what the problem was. dbt expects the current model to be referenced using {{ this }}, rather than using the ref() function.

Wrapping a reference to {{ this }} in an is_incremental() function allows me to do what I want to do.

2 Likes

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.