Internal working of incremental

ramesh.gargdbt · August 29, 2023, 10:03am

Hi, In incremental materialisation what is the strategy it follows for backfill the new column,
Config : file type =orc , on_schema_change=append_new_columns

I am using it, its filling null.
My understanding is :
It dosen’t add anything in old data files. Just add a new column in index data for older files, and keep row data as it is.

But i have some doubts on it. If this is the case with ORC format, does it shows different behaviour with other formats?

If there is another reason why its not backfilled, and is there a way to achieve it without full refresh? please share.

stephen · August 31, 2023, 5:37am

We use it with BigQuery, and if you are using incremental loads with the append_new_columns config, then the new columns will be null for rows populated previously (pretty much what the documentation says). If you want to populate the values for the new column for the previously generated rows, you need to do a full refresh (run with “-f” or “–full-refresh” flag).

ramesh.gargdbt · August 31, 2023, 7:18am

Hi, yes i agree (mentioned in original post) But why does it show such behaviour?

stephen · August 31, 2023, 1:50pm

Plagiarising this post by Tristan since its a bit outdated now as dbt supports adding columns to incremental tables.
Basically from my understanding, on every dbt run for an incremental model:-

Add column to destination table if schema has changed*
generate a temp table with the contents of your model select statement
delete rows from your existing table with unique keys that match keys that exist in your temp table
insert all records from your temp table into the target table
drop the temp table

Now as you can observe it only generates data for the new column(s) for the last batch of records as a result the new columns remain null for the rest of the rows.

hugbee · September 28, 2023, 5:17pm

Hi!
A little late response coming from me.
I have never really used this functionality so i cannot prove that my idea will help you.
But here is my reasoning:
dbt helps you only generating the code that you would like to see based on your model configuration.
So if you configure the model as insert-only incremental model (as a result of not setting a unique-key) then of course your incremental run will only influence new inserted rows.
If, however, you want to do updates of existing rows when applicable then dbt needs to generate a merge with inserts and updates - you will invike that by setting a unique-key in your config.
Of course your incremental logic needs to be in such way that it provides the data also for your existing rows in your target.
If you will try it out then I woud be much obliged reading about your results.

Topic		Replies	Views
Enquiry on incremental materialisation Help incremental	4	1389	November 30, 2022
Adding a column to an incremental model Archive	1	28872	January 10, 2025
Changing the logic of column in incremental model Help incremental , snowflake	4	6219	March 7, 2023
Change schema on an incremental model Help incremental , custom-schema	1	50	June 2, 2025
The incremental model is currently choosing columns that have not yet been selected. Help incremental , snowflake , dbt-core	1	1170	February 8, 2024

Internal working of incremental

Related topics