dbt do if I add a new column to an existing incremental model? Add the column and keep the value for it as NULL for older rows?
Great question. dbt is “stateless”—it has no idea whether a model is been run before, if a model has changed, if a file has been edited, nothing. As such, when you do a
dbt run, dbt simply generates SQL based on the current code in your project. And the code for an incremental model, essentially, does the following:
- generate a temp table with the contents of your model select statement
- delete rows from your existing table with unique keys that match keys that exist in your temp table
- insert all records from your temp table into the target table
- drop the temp table
The problem comes when you hit step #3. If the columns in your temp table are not the same as the columns in your target table, the insert will likely fail. This is one of the reasons the
--full-refresh flag exists (docs here)! When you make changes to your incremental models, you need to to a
--full-refresh run and dbt will rebuild all of your incremental models from scratch using the current code.
Also: there is an issue here that asks for an auto-full-refresh feature. You can feel free to comment there if you’d like, but I don’t anticipate that this is something that we’re going to work on in the very near future.
Got it. So, if there’s been an incremental model in production running daily for a while and the business determines that it needs to add a field to it, there’s no way to do that in dbt outside of a full refresh? What if the historical source data isn’t fully available?