We’ve started working with DBT and so far things have been generally fairly smooth however one area I can’t seem to find any resources which focus on how others have been approaching model run scheduling.
We have a variety of models we’d like to move into DBT but they are re-computed on different cadences so we don’t want to simply run
dbt run. I’ve seen some ways to solve this problem but most have caveats that I’d like to avoid if possible.
Approach one: Use tags or directory structure and an enumerated set of schedule ‘buckets’ which can be targeted via
dbt run flags to run only those. The caveats of this approach (which assumes you’re not using dbt cloud to be clear) would be that you have limited granularity to schedule jobs as each one has a one-to-one mapping for a
dbt run command which targets only those models.
Approach two: In dbt cloud which we’ve been taking a peek at using is that each model needs to be configured in the UI (or possibly via API which could open some solutions to solve this). Downside to doing this in the UI is that this becomes the only piece of the DBT project which is no longer stored in version control and is not co-located with everything else in the DBT world. Generally seems not very dbt-esque.
While writing this I did realize there could be some way to use a specially formatted tag or comment or something to be able to be found by a manually written script to connect the repo declarations of schedule to the deployment system (via dbt cloud API or airflow API etc).
Curious to hear others thoughts on this space.