Adding a column to an incremental model


#1

What will dbt do if I add a new column to an existing incremental model? Add the column and keep the value for it as NULL for older rows?


#2

Great question. dbt is “stateless”—it has no idea whether a model is been run before, if a model has changed, if a file has been edited, nothing. As such, when you do a dbt run, dbt simply generates SQL based on the current code in your project. And the code for an incremental model, essentially, does the following:

  1. generate a temp table with the contents of your model select statement
  2. delete rows from your existing table with unique keys that match keys that exist in your temp table
  3. insert all records from your temp table into the target table
  4. drop the temp table

The problem comes when you hit step #3. If the columns in your temp table are not the same as the columns in your target table, the insert will likely fail. This is one of the reasons the --full-refresh flag exists (docs here)! When you make changes to your incremental models, you need to to a --full-refresh run and dbt will rebuild all of your incremental models from scratch using the current code.


#3

Also: there is an issue here that asks for an auto-full-refresh feature. You can feel free to comment there if you’d like, but I don’t anticipate that this is something that we’re going to work on in the very near future.


#4

Got it. So, if there’s been an incremental model in production running daily for a while and the business determines that it needs to add a field to it, there’s no way to do that in dbt outside of a full refresh? What if the historical source data isn’t fully available?