Dbt Orchestration: Airflow vs Dbt Jobs

hello!

I’m interested in running our entire model on a 2 hour cadence.

I just started researching dbt Cloud and see you can schedule jobs within the UI. Is there a certain use case for when to use AIrflow vs just scheduling dbt cloud jobs? I only see information online about integrating airflow with dbt, however if I wanted to get up and running quickly it appears from the documentation that scheduled jobs is available within the UI.
dbt cloud jobs appears to be very easy to set up and run, and I’m confused on why I only see information on Airflow integration with dbt and not much about setting up scheduled jobs.

Primarily:

  1. Can I run my entire dbt run through a scheduled job, instead of using airflow completely, and save some complexity?
  2. Does this allow for running ‘dbt run’ through different BigQuery projects (not just schemas)?

What am I missing?

Hey @Zenb!

Selfishly, I suspect that because dbt Cloud’s scheduler is pretty simple to set up, there’s not a lot of demand for lots of people building explainers on how to use it :sweat_smile: We should do a better job of making that information available though - what were you you searching online that brought up mostly Airflow?

A lot of companies do use Airflow alongside dbt Cloud, but unless you have other tools in your data transformation process (e.g. an ML process that you need to trigger in a different system after a dbt run completes) it’s definitely not mandatory and you’ll be well served by using dbt Cloud’s built in scheduler to avoid the extra complexity.

To answer your questions:

Short answer: yes! Long answer: probably, unless you have a particularly complex setup as described above.

I don’t normally use BQ myself so I’m not 100% sure, but it’s supported in dbt Core so I don’t see why not!

Thanks Joel! Sounds like I might be able to use dbt Cloud for my scheduled runs. Do you know of any resources for deploying dbt runs in different environments (not different schemas)? I currently have a dev, staging and prod dataset in bigquery (which is like it’s own database) and would want different versions of our git repo to be in different environments.

1 Like

Hi @Zenb

As far as I know, there are two main options with Airflow.

  1. Trigger dbt Cloud Jobs from Airflow (dbt Cloud Operators — apache-airflow-providers-dbt-cloud Documentation)
  2. Parse the dbt project and create DAG by yourself.

The second option relies on dbt CLI and requires more coding, but you can fully manage any part of dbt and save on using the open-source version of dbt rather than paying for dbt Cloud. On the other hand, dbt Cloud provides many features and saves developing time.

You could setup different projects for BIgQuery using database configuration in your dbt_project.yml file

With the disclaimer that this isn’t BQ-flavoured advice, here’s the broad strokes of how it would work:

Dev

Your development environment gets its own schema which can also be on a separate dataset (for example each developer at dbt Labs has a schema named something like dbt_jlabes in a Snowflake database called analytics_dev).

Staging (CI)

You can set up CI jobs so that when a PR is opened in GitHub/Azure DevOps/GitLab, dbt Cloud will build the changed models in a separate temporary schema. I don’t know offhand if you can configure it to build in a different dataset sorry!

Prod

The main environment that you’ve probably already configured

Thanks all!

We have separate projects (essentially servers) for our dev, staging, prod etc. So having different schemas probably won’t work for our usecase. That said, It sounds like there is a path forward for us to be able to develop within one ‘master’ repo, and then push different versions of that repo out to the different servers.

You should definitely have individual schemas for each dbt user on your dev server, to avoid two people accidentally clobbering each other’s tables. For example I would have a copy of the customers model in my own dev environment at analytics_dev.dbt_jlabes.customers, and you would have analytics_dev.dbt_zenb.customers, which means that I can mess around with the customers table without impacting you. Once my changes have gone through the code review process and been merged into the main branch, you’d get the new version of the table on your schema too.

Yes this is spot on - this is how you can be confident that your dev/staging/prod environments all look the same, because they’re built from the same code.