Model run scheduling patterns

walshie4 · February 12, 2020, 1:13am

Hello

We’ve started working with DBT and so far things have been generally fairly smooth however one area I can’t seem to find any resources which focus on how others have been approaching model run scheduling.

We have a variety of models we’d like to move into DBT but they are re-computed on different cadences so we don’t want to simply run dbt run. I’ve seen some ways to solve this problem but most have caveats that I’d like to avoid if possible.

Approach one: Use tags or directory structure and an enumerated set of schedule ‘buckets’ which can be targeted via dbt run flags to run only those. The caveats of this approach (which assumes you’re not using dbt cloud to be clear) would be that you have limited granularity to schedule jobs as each one has a one-to-one mapping for a dbt run command which targets only those models.

Approach two: In dbt cloud which we’ve been taking a peek at using is that each model needs to be configured in the UI (or possibly via API which could open some solutions to solve this). Downside to doing this in the UI is that this becomes the only piece of the DBT project which is no longer stored in version control and is not co-located with everything else in the DBT world. Generally seems not very dbt-esque.

While writing this I did realize there could be some way to use a specially formatted tag or comment or something to be able to be found by a manually written script to connect the repo declarations of schedule to the deployment system (via dbt cloud API or airflow API etc).

Curious to hear others thoughts on this space.

Thanks
-Adam Walsh

tnightengale · February 27, 2020, 11:02pm

Hey Adam,

Thinking out loud here, about your assessment of option 1:

Approach one: Use tags or directory structure and an enumerated set of schedule ‘buckets’ which can >be targeted via dbt run flags to run only those. The caveats of this approach (which assumes you’re >not using dbt cloud to be clear) would be that you have limited granularity to schedule jobs as each >one has a one-to-one mapping for a dbt run command which targets only those models.

What do you mean by a 1-1 mapping for jobs to command. Couldn’t you just configure everything in your project with a hourly or daily tag and then have an Airflow job running dbt run with the pertinent tag? What am I missing?

walshie4 · March 5, 2020, 12:17am

That’s what I mean. The downside I’m referring to is that you have to add each run ‘type’. Meaning if I now want to do things every 10 minutes I need to define a new tag and setup a new dbt run job that targets that tag in the appropriate way.

Topic		Replies	Views
Scheduling DBT pipelines using two mandatory tags Help environments , orchestration-and-deployment	3	2656	January 10, 2023
Execute models a max of N times per day Help best-practice , dbt-cloud	0	1055	March 15, 2023
build group of models at different runtimes Help	2	826	September 26, 2023
Release dbt v0.12.0 Archive	0	2530	November 13, 2018
Dbt Orchestration: Airflow vs Dbt Jobs Help best-practice , bigquery , airflow , dbt-cloud	6	4875	January 17, 2023

Model run scheduling patterns

Related topics