Internal working of incremental

Hi, In incremental materialisation what is the strategy it follows for backfill the new column,
Config : file type =orc , on_schema_change=append_new_columns

I am using it, its filling null.
My understanding is :
It dosen’t add anything in old data files. Just add a new column in index data for older files, and keep row data as it is.

But i have some doubts on it. If this is the case with ORC format, does it shows different behaviour with other formats?

If there is another reason why its not backfilled, and is there a way to achieve it without full refresh? please share.

We use it with BigQuery, and if you are using incremental loads with the append_new_columns config, then the new columns will be null for rows populated previously (pretty much what the documentation says). If you want to populate the values for the new column for the previously generated rows, you need to do a full refresh (run with “-f” or “–full-refresh” flag).

Hi, yes i agree (mentioned in original post) But why does it show such behaviour?

Plagiarising this post by Tristan since its a bit outdated now as dbt supports adding columns to incremental tables.
Basically from my understanding, on every dbt run for an incremental model:-

  1. Add column to destination table if schema has changed*
  2. generate a temp table with the contents of your model select statement
  3. delete rows from your existing table with unique keys that match keys that exist in your temp table
  4. insert all records from your temp table into the target table
  5. drop the temp table

Now as you can observe it only generates data for the new column(s) for the last batch of records as a result the new columns remain null for the rest of the rows.

Hi!
A little late response coming from me.
I have never really used this functionality so i cannot prove that my idea will help you.
But here is my reasoning:
dbt helps you only generating the code that you would like to see based on your model configuration.
So if you configure the model as insert-only incremental model (as a result of not setting a unique-key) then of course your incremental run will only influence new inserted rows.
If, however, you want to do updates of existing rows when applicable then dbt needs to generate a merge with inserts and updates - you will invike that by setting a unique-key in your config.
Of course your incremental logic needs to be in such way that it provides the data also for your existing rows in your target.
If you will try it out then I woud be much obliged reading about your results.