Model run scheduling patterns

Hello :wave:

We’ve started working with DBT and so far things have been generally fairly smooth however one area I can’t seem to find any resources which focus on how others have been approaching model run scheduling.

We have a variety of models we’d like to move into DBT but they are re-computed on different cadences so we don’t want to simply run dbt run. I’ve seen some ways to solve this problem but most have caveats that I’d like to avoid if possible.

Approach one: Use tags or directory structure and an enumerated set of schedule ‘buckets’ which can be targeted via dbt run flags to run only those. The caveats of this approach (which assumes you’re not using dbt cloud to be clear) would be that you have limited granularity to schedule jobs as each one has a one-to-one mapping for a dbt run command which targets only those models.

Approach two: In dbt cloud which we’ve been taking a peek at using is that each model needs to be configured in the UI (or possibly via API :thinking: which could open some solutions to solve this). Downside to doing this in the UI is that this becomes the only piece of the DBT project which is no longer stored in version control and is not co-located with everything else in the DBT world. Generally seems not very dbt-esque.

While writing this I did realize there could be some way to use a specially formatted tag or comment or something to be able to be found by a manually written script to connect the repo declarations of schedule to the deployment system (via dbt cloud API or airflow API etc).

Curious to hear others thoughts on this space.

Thanks
-Adam Walsh

3 Likes

Hey Adam,

Thinking out loud here, about your assessment of option 1:

Approach one: Use tags or directory structure and an enumerated set of schedule ‘buckets’ which can >be targeted via dbt run flags to run only those. The caveats of this approach (which assumes you’re >not using dbt cloud to be clear) would be that you have limited granularity to schedule jobs as each >one has a one-to-one mapping for a dbt run command which targets only those models.

What do you mean by a 1-1 mapping for jobs to command. Couldn’t you just configure everything in your project with a hourly or daily tag and then have an Airflow job running dbt run with the pertinent tag? What am I missing?

That’s what I mean. The downside I’m referring to is that you have to add each run ‘type’. Meaning if I now want to do things every 10 minutes I need to define a new tag and setup a new dbt run job that targets that tag in the appropriate way.