DBT, concurrent runs and the need for locking

bchazalet · May 25, 2021, 7:58am

We’ve been running all DBT models at once so far, and we’re considering switching to an approach where we could only run the right models every time a source is updated, which would be a better fit given our sources and their different schedules.

We’d be following the approach that @nehiljain describes in this talk as a “Sources ELT” which is essentially running dbt run -m source:datasource+ when a source is updated (like him we use airflow to orchestrate DBT).

However, we have some models in our graph that are very central and thus are downstream of most sources, which means that they and their own downstream models would be run for most sources. So the problem is that we must prevent the different runs to happen at the same time to ensure consistency in our data warehouse as I don’t think DBT will do it for us.

I’m curious if anyone has had a similar problem and what solution they implemented? Do people implement their own way of ensuring the same DBT models are not touched by different concurrents runs (e.g. via a lock)?

PS: @nehiljain I really enjoyed the talk, thank you! I wish I had been in the audience to be able to ask questions.

dataguru · April 8, 2022, 4:37pm

Hello @bchazalet - did you find any resolution to this?

Topic		Replies	Views
Concurrent dbt runs Help incremental , bigquery , orchestration-and-deployment	1	2862	April 11, 2024
Multiple dbt runs on same model but different time range create same tmp table Help	2	1606	October 10, 2023
dbt model freezes on `dbt run` Help postgres	3	4215	December 7, 2022
Help with model running indefinitely when build as part of a pipeline Help	0	357	May 17, 2024
Is there a way to have two different models coexist that is writing to same table? Help jinja , snowflake , dbt-cloud	5	9581	March 15, 2023

DBT, concurrent runs and the need for locking

Related topics