Running two dbt Cloud jobs back to back

I have job1 and job2
I need job1 to be scheduled and it takes 3H to finish
after job1 completes I want job2 to be executed and it take 2H

if I just schedule them and they run in parallel I have problem on TABLE looking

how can I do that in dbt cloud without external scheduler?

job1 → job2 → job1 etc for ever

Hey @obar1, dbt cloud jobs don’t have dependencies between one another by default. We rely on the dependency graph that dbt builds for itself internally to make sure things run in the correct order and without conflict.

Why do they need to be two jobs? Could you add the steps from job 2 into job 1’s definition?

Before dbt build was released, my old team did have a job for snapshots which ran separately to the rest of the project. We only ran it once a day and scheduled it far enough in advance of the other runs that they never overlapped, but it doesn’t sound like that solves your use case.

I’d be interested to see the specific problem you’re trying to solve

hi @joellabes I have 1 model that is big let’s call it big_job so the idea was to split the original 1 job we had in 2 jobs like so

JOB_Y

dbt build --select +big_job+

and another
JOB_OTHERS

dbt build --exclude +big_job+

but there are issues with the intermediate layer now:
as JOB_OTHERS will not materialize the intermediate that in the lineage graph are ancestor on big_job (exclude cmd) so I need to run first JOB_Y to build them and then JOB_OTHERS
Having 1 in all take too much time … to get some big_job done and we cannot increate AWS spending for new :stuck_out_tongue: so in this way we prioritize big_job over the rest …
any suggestion how to do it better ?

I’m a bit confused - you said that having them all in a single job takes too much time, and so you want to prioritise the big_job over the rest, but in your original message you asked for them to run in series. Splitting it out won’t reduce the total time to run if you’re only going to start job 2 after job 1 is complete anyway.

How many threads do you have in your job definition? You said AWS spend so I assume you’re using Redshift. Too many threads in Redshift can lead to resource contention and the job actually running slower. I would aim for 4 (maybe try with up to 8 and see where you get a decreasing return).

You might be able to create a single job that contains several steps, something like this:

dbt build -s +big_job --exclude big_job --threads 4
dbt build -s big_job --threads 1
dbt build -s big_job+ --exclude big_job --threads 4

The one thing to keep in mind here is that any nodes that aren’t related to big_job won’t be built using this selector. If that’s a problem, you might be able to express it with a YAML selector but I haven’t spent enough time using those to have any useful guidance.

1 Like

tagging can be handy as well when I have more the once big job :slight_smile:

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.