Scheduling DBT pipelines using two mandatory tags

I’m trying to find a solution for scheduling and tagging my models. I have two tags e.g. environment and schedule, where environment can take values like staging/production, while schedule can take values like daily/weekly/monthly. I want to only tag the final model of each data pipeline I have, and when I run my dbt command I want the entire pipeline (inlcuding upstream models) to run. However, I only want to run pipelines that are tagged with both an environment and schedule tag. Ultimately I’m looking for a solution for this task in DBT, however, the solution I’ve trying to get working is bumping into this problem below.

So the command I’ve being trying to use is dbt run --select +tag:staging,+tag:daily and this does work most of the time. However, if I have say pipeline A, and one of my upstream models (model 1) is a final model of another pipeline. Model 1 is tagged with staging and daily, while the final model of pipeline A (model 2) is tagged with production and weekly. To recap, model 1 is an upstream model of model 2.

If I run the following command for a completely separate pipeline B (not sharing any upstream models with pipeline A) dbt run --select +tag:production,+tag:daily . I wouldn’t expect any of my models from pipeline A to run (since model 1 and 2 would need to have the tags production and daily, and or be part of the upstream models of a model with those tags, and both of these cases are not true), however, model 1 ends up being run because of the way the intersection operator works. E.g. +tag:production will pick up model 2 and all it’s upstream models therefore picking up model 1, while +tag:daily will pick up model 1 because it has the daily tag. And because each intersection group picks up model 1, model 1 will run even though it shouldn’t have. To recap, although dbt run --select +tag:production,+tag:daily works, and runs what I need, it also unintentionally runs a model it shouldn’t have.

Basically because of the current order of operations for DBT syntax selection, selection methods, then graph operators, then set operators I get this problem. If say the order was instead selection methods, then set operators, then graph operators this method would work. Is there anyway around this so that I could make this work? If not, does anyone have any ideas for how I could get this to work?

Thanks!

Hello - Please use --exclude flag to skip the models which are not required in your pipeline

–exclude tag:model1/2 [add tag as : model1/2] to model

Thanks for the reply. Although I’m looking for a more general solution approach, as I won’t always know what model 1 and 2 are if there are lots of pipelines being scheduled.

I think your diagnosis of the problem (the order of evaluation) is correct, so I think you’d be better served opening an issue on the dbt Core repo to see whether there is any appetite to make this more customisable.

I haven’t looked into it, but you might also be able to get somewhere with YAML selectors if you haven’t already? They have a bit more control: YAML Selectors | dbt Developer Hub

1 Like