Auto refreshing models for incremental models with Schema changes

chuangl4 · April 24, 2018, 6:17pm

Is a way in DBT to auto-refresh incremental models when schema of these models change (like drop a column) ?My company is using DBT, and data pipeline often breaks because someone changes the model schema and forgets to refresh the related models.

tristan · April 24, 2018, 6:31pm

Hi @chuangl4! Great question. In fact, this is something that people have wanted with dbt for a long time. Here’s an issue that’s been around since July 2017! That’s a good place to start if you want to go deep here.

At a high level, the thing you’re describing would be very useful, and it’s one that we care about. We haven’t implemented it to-date because it’s actually quite a large architectural shift in the way that dbt works.

The core of the issue is one of statefulness. dbt is designed to be stateless: when you type dbt run, dbt compiles and runs your project. Once it’s done, it spins down. It doesn’t maintain any history between runs–it’s just not designed to do that today.

Our goal for dbt is to split it into a client library and a server process. The client library would make requests to the server (things like compile and run) and would be stateless. The server would be stateful. It would be responsible for user authentication, serving requests, and maintaining state between runs. This server process would be the place where we’d want to implement the type of behavior you’re talking about. But–this split doesn’t exist at all today! It’s going to be a major part of our 2018 to get there, and we don’t see a shortcut.

We’re so heavily invested in making this transition because there is actually a whole category of features that would be enabled by a stateful server process. For example: “only run models that I’ve changed since my most recent git commit”. Very useful!

Stay tuned here; we’ll absolutely share more information as we make progress.

In the meantime, if you’re going to use incremental models, it’s critical to trigger a full-refresh of the data using --full-refresh when the schema of the model changes. If you’re scheduling your runs via Sinter that’s quite easy to accomplish. If you’re using Airflow, it can be a bit more tricky to give analysts this kind of control.

chuangl4 · April 24, 2018, 7:09pm

thank you. It makes a lot of sense. Look forward to the release of this feature.

joellabes · September 23, 2022, 4:03am

This was released in dbt 0.21.0! https://github.com/dbt-labs/dbt-core/pull/3387

Topic		Replies	Views
Running an incremental model with full-refresh is not working Help full-refresh , incremental	3	4317	June 19, 2023
Incremental model runs only like "create or replace table..." Help incremental , databricks	6	3953	October 24, 2023
Change schema on an incremental model Help incremental , custom-schema	1	59	June 2, 2025
How to prevent (accidental) full refreshes for a model Show and Tell jinja , full-refresh , incremental	2	15485	September 8, 2022
Incremental Model is not working Help databricks , dbt-core	0	39	July 23, 2024

Auto refreshing models for incremental models with Schema changes

Related topics