Hello!
Quick post for anyone wondering how to make incremental materialization work with generated columns. Context: in Faros CE, we model and EngOps entities (PRs, Incidents, …) in PostgreSQL for use in BI and automation. We have some columns that are automatically generated by extracting data from some other columns storing jsonb like this:
alter table "ims_Incident" add column "statusDetail" text generated always as (status ->> 'detail') stored;
We have a transform that tries to incrementally write to that ims_Incident table. With the incremental macro as is, the generated sql for the task to incident transform looks like this:
delete from "faros"."public"."ims_Incident"
where (id) in (
select (id)
from "ims_Incident__dbt_tmp141746223081"
);
insert into "faros"."public"."ims_Incident" ("id", "origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source", "severityDetail", "severityCategory", "priorityDetail", "priorityCategory", "statusDetail", "statusCategory")
(
select "id", "origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source", "severityDetail", "severityCategory", "priorityDetail", "priorityCategory", "statusDetail", "statusCategory"
from "ims_Incident__dbt_tmp141746223081"
)
Because it includes generated columns (e.g., statusDetail), this query will fail.
We decided to modify the macro to accept a list of columns to ignore when generating these queries. We adapted the main incremental macro.
Instead of using every single column (adapter.get_columns_in_relation(existing_relation)) in the existing table to generate the query, we ignore the ones specified via the “ignore_columns” configuration for that table (if any).
The change was pretty straightforward. Although to the best of our knowledge it is not documented, having such a macro in your macros folder will override the default one. We can then pass the list of columns to ignore via a configuration in the transform.
After these changes, the generated query looks like below:
delete from "faros"."public"."ims_Incident"
where (id) in (
select (id)
from "ims_Incident__dbt_tmp142134657426"
);
insert into "faros"."public"."ims_Incident" ("origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source")
(
select "origin", "refreshedAt", "uid", "title", "description", "url", "severity", "priority", "status", "createdAt", "updatedAt", "acknowledgedAt", "resolvedAt", "source"
from "ims_Incident__dbt_tmp142134657426"
)
The columns specified in the configuration are no longer included.