We would like to use a deployment strategy that only builds the models that have been modified. However if you only build downstream you run into the problem that one of two upstream tables for an intermediate won’t change, resulting in missing information in your intermediate during the day.
The context of why I’m trying to do this
Our current daily job runs about 250 models. We run it every night to capture all changes. However, during the work hours we would prefer our cd job to only run changed models but don’t want to lose out on some upstream changes. For example:
*stg_order * stg_invoice
These two combine into int_invoice_orders.
If we change stg_order and only run downstream, we will have orders without invoices in our int_invoice_orders.
Is there a way to run state:modified+ but also check for upstream staging tables of intermediates and run those too?
What I’ve already tried
I’ve looked a the documentation and tried creating a macro using chatgpt and my own brains, but so far i haven’t managed to get to the desired command statement.
the modified models and downstream models (state:modified+)
Models that are at the same time
– Parents to the downstream models (@state:modified) Graph operators | dbt Developer Hub
– Inside the staging folder (models.stating)
@luca.odinga awesome! just be careful with it, because it is supposed to run ’ all ancestors of all descendants of the selected model’, so it can run a lot of stuff.
That’s why I added staging models path with the intersect operator ,
So it limits the parents to the ones that are staging models
Thanks for the heads up. I’ve been testing around with the @ operator and it truth what I think we would need is something like: all ancestors of all descendants of the selected model AND the descendants of those ancestors
However, that means even more models would be run and that ultimately brings us almost to a full cd job in some cases.
Which at this time using an S size warehouse in snowflake takes almost 6 hours. There is a lot we still need to learn and improve on I’m afraid. Still very much a dbt noob here
Thanks for thinking along! I hope the @ operator will help us out in the end