I have job1 and job2
I need job1 to be scheduled and it takes 3H to finish
after job1 completes I want job2 to be executed and it take 2H
if I just schedule them and they run in parallel I have problem on TABLE looking
how can I do that in dbt cloud without external scheduler?
job1 → job2 → job1 etc for ever
Hey @obar1, dbt cloud jobs don’t have dependencies between one another by default. We rely on the dependency graph that dbt builds for itself internally to make sure things run in the correct order and without conflict.
Why do they need to be two jobs? Could you add the steps from job 2 into job 1’s definition?
dbt build was released, my old team did have a job for snapshots which ran separately to the rest of the project. We only ran it once a day and scheduled it far enough in advance of the other runs that they never overlapped, but it doesn’t sound like that solves your use case.
I’d be interested to see the specific problem you’re trying to solve
hi @joellabes I have 1 model that is big let’s call it big_job so the idea was to split the original 1 job we had in 2 jobs like so
dbt build --select +big_job+
dbt build --exclude +big_job+
but there are issues with the intermediate layer now:
as JOB_OTHERS will not materialize the intermediate that in the lineage graph are ancestor on big_job (exclude cmd) so I need to run first JOB_Y to build them and then JOB_OTHERS
Having 1 in all take too much time … to get some big_job done and we cannot increate AWS spending for new so in this way we prioritize big_job over the rest …
any suggestion how to do it better ?
I’m a bit confused - you said that having them all in a single job takes too much time, and so you want to prioritise the
big_job over the rest, but in your original message you asked for them to run in series. Splitting it out won’t reduce the total time to run if you’re only going to start job 2 after job 1 is complete anyway.
How many threads do you have in your job definition? You said AWS spend so I assume you’re using Redshift. Too many threads in Redshift can lead to resource contention and the job actually running slower. I would aim for 4 (maybe try with up to 8 and see where you get a decreasing return).
You might be able to create a single job that contains several steps, something like this:
dbt build -s +big_job --exclude big_job --threads 4
dbt build -s big_job --threads 1
dbt build -s big_job+ --exclude big_job --threads 4
The one thing to keep in mind here is that any nodes that aren’t related to
big_job won’t be built using this selector. If that’s a problem, you might be able to express it with a YAML selector but I haven’t spent enough time using those to have any useful guidance.
tagging can be handy as well when I have more the once big job
This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.