I’m trying to find a solution for scheduling and tagging my models. I have two tags e.g. environment and schedule, where environment can take values like staging/production, while schedule can take values like daily/weekly/monthly. I want to only tag the final model of each data pipeline I have, and when I run my dbt command I want the entire pipeline (inlcuding upstream models) to run. However, I only want to run pipelines that are tagged with both an environment and schedule tag. Ultimately I’m looking for a solution for this task in DBT, however, the solution I’ve trying to get working is bumping into this problem below.
So the command I’ve being trying to use is dbt run --select +tag:staging,+tag:daily
and this does work most of the time. However, if I have say pipeline A, and one of my upstream models (model 1) is a final model of another pipeline. Model 1 is tagged with staging and daily, while the final model of pipeline A (model 2) is tagged with production and weekly. To recap, model 1 is an upstream model of model 2.
If I run the following command for a completely separate pipeline B (not sharing any upstream models with pipeline A) dbt run --select +tag:production,+tag:daily
. I wouldn’t expect any of my models from pipeline A to run (since model 1 and 2 would need to have the tags production and daily, and or be part of the upstream models of a model with those tags, and both of these cases are not true), however, model 1 ends up being run because of the way the intersection operator works. E.g. +tag:production
will pick up model 2 and all it’s upstream models therefore picking up model 1, while +tag:daily
will pick up model 1 because it has the daily tag. And because each intersection group picks up model 1, model 1 will run even though it shouldn’t have. To recap, although dbt run --select +tag:production,+tag:daily
works, and runs what I need, it also unintentionally runs a model it shouldn’t have.
Basically because of the current order of operations for DBT syntax selection, selection methods, then graph operators, then set operators I get this problem. If say the order was instead selection methods, then set operators, then graph operators this method would work. Is there anyway around this so that I could make this work? If not, does anyone have any ideas for how I could get this to work?
Thanks!