We’ve been running all DBT models at once so far, and we’re considering switching to an approach where we could only run the right models every time a source is updated, which would be a better fit given our sources and their different schedules.
We’d be following the approach that @nehiljain describes in this talk as a “Sources ELT” which is essentially running dbt run -m source:datasource+
when a source is updated (like him we use airflow to orchestrate DBT).
However, we have some models in our graph that are very central and thus are downstream of most sources, which means that they and their own downstream models would be run for most sources. So the problem is that we must prevent the different runs to happen at the same time to ensure consistency in our data warehouse as I don’t think DBT will do it for us.
I’m curious if anyone has had a similar problem and what solution they implemented? Do people implement their own way of ensuring the same DBT models are not touched by different concurrents runs (e.g. via a lock)?
PS: @nehiljain I really enjoyed the talk, thank you! I wish I had been in the audience to be able to ask questions.