Hey @obar1, dbt cloud jobs don’t have dependencies between one another by default. We rely on the dependency graph that dbt builds for itself internally to make sure things run in the correct order and without conflict.
Why do they need to be two jobs? Could you add the steps from job 2 into job 1’s definition?
Before dbt build was released, my old team did have a job for snapshots which ran separately to the rest of the project. We only ran it once a day and scheduled it far enough in advance of the other runs that they never overlapped, but it doesn’t sound like that solves your use case.
I’d be interested to see the specific problem you’re trying to solve
hi @joellabes I have 1 model that is big let’s call it big_job so the idea was to split the original 1 job we had in 2 jobs like so
JOB_Y
dbt build --select +big_job+
and another
JOB_OTHERS
dbt build --exclude +big_job+
but there are issues with the intermediate layer now:
as JOB_OTHERS will not materialize the intermediate that in the lineage graph are ancestor on big_job (exclude cmd) so I need to run first JOB_Y to build them and then JOB_OTHERS
Having 1 in all take too much time … to get some big_job done and we cannot increate AWS spending for new so in this way we prioritize big_job over the rest …
any suggestion how to do it better ?
I’m a bit confused - you said that having them all in a single job takes too much time, and so you want to prioritise the big_job over the rest, but in your original message you asked for them to run in series. Splitting it out won’t reduce the total time to run if you’re only going to start job 2 after job 1 is complete anyway.
How many threads do you have in your job definition? You said AWS spend so I assume you’re using Redshift. Too many threads in Redshift can lead to resource contention and the job actually running slower. I would aim for 4 (maybe try with up to 8 and see where you get a decreasing return).
You might be able to create a single job that contains several steps, something like this:
The one thing to keep in mind here is that any nodes that aren’t related to big_job won’t be built using this selector. If that’s a problem, you might be able to express it with a YAML selector but I haven’t spent enough time using those to have any useful guidance.
dbt Cloud now supports triggering one job when another job finishes - I still don’t think this is necessarily the right solution for the use case described in this thread, but it ranks highly in Google for the search term so I’m adding a link for others who come along to it: