How to handle generated columns in target model in incremental materialization

Hello!

Quick post for anyone wondering how to make incremental materialization work with generated columns. Context: in Faros CE, we model and EngOps entities (PRs, Incidents, …) in PostgreSQL for use in BI and automation. We have some columns that are automatically generated by extracting data from some other columns storing jsonb like this:

alter table "ims_Incident" add column "statusDetail" text generated always as (status ->> 'detail') stored;

We have a transform that tries to incrementally write to that ims_Incident table. With the incremental macro as is, the generated sql for the task to incident transform looks like this:

delete from "faros"."public"."ims_Incident"

where (id) in (

select (id)

from "ims_Incident__dbt_tmp141746223081"

);

insert into "faros"."public"."ims_Incident" ("id", "origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source", "severityDetail", "severityCategory", "priorityDetail", "priorityCategory", "statusDetail", "statusCategory")

(

select "id", "origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source", "severityDetail", "severityCategory", "priorityDetail", "priorityCategory", "statusDetail", "statusCategory"

from "ims_Incident__dbt_tmp141746223081"

)

Because it includes generated columns (e.g., statusDetail), this query will fail.

We decided to modify the macro to accept a list of columns to ignore when generating these queries. We adapted the main incremental macro.

Instead of using every single column (adapter.get_columns_in_relation(existing_relation)) in the existing table to generate the query, we ignore the ones specified via the “ignore_columns” configuration for that table (if any).

The change was pretty straightforward. Although to the best of our knowledge it is not documented, having such a macro in your macros folder will override the default one. We can then pass the list of columns to ignore via a configuration in the transform.

After these changes, the generated query looks like below:

delete from "faros"."public"."ims_Incident"

where (id) in (

select (id)

from "ims_Incident__dbt_tmp142134657426"

);

insert into "faros"."public"."ims_Incident" ("origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source")

(

select "origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source"

from "ims_Incident__dbt_tmp142134657426"

)

The columns specified in the configuration are no longer included.

1 Like

Nice! Could you please show a sample use of the macro invocation in the model file—is it something like …

{{
    config(
        materialized='incremental',
        unique_key='id',
        ignore_columns=['id']   
    )
}}

… at the top of the model file?

1 Like

Yes, it looks like this:

{
  config(
    alias='ims_Incident', 
    materialized='incremental',
    unique_key='id',
    ignore_columns={
      'ims_Incident': [
        'id', 
        'severityDetail', 
        'severityCategory', 
        'priorityDetail', 
        'priorityCategory', 
        'statusDetail', 
        'statusCategory'
      ]
    }
  ) 
}}

You can see the model here

(I couldn’t link to it in the original post because some limit in the number of links for new accounts)

Ah, thanks, I’m glad I asked. Why is that extra level of specifying the table name necessary in ignore_columns?

1 Like

Thanks for asking!

Because I needed the workaround for another model

We also have a write up where we talk about the actual use cases here