Trigger a dbt Cloud job on merging to master branch

Background

dbt Cloud is an excellent service and great value, providing an easy way to manage scheduled dbt jobs. As well as scheduling, it allows for Webhooks to be used so that a job can be triggered to run on a pull request (PR) to a branch specified in your dbt Cloud deployment environment. Jobs triggered via the webhook will be run against a new schema in your datawarehouse that will be cleaned up at the end of the run. Great for testing, but what about running a real production job?

In addition to schedules and webhooks, dbt Cloud also provides an API for triggering runs, which you can use to run a job that will use the regular schema(s) you use for jobs.

Why?

The main reason you might want to do this is to minimize how often a job runs that consists entirely of views, or to generate the dbt docs. On a target warehouse like Snowflake, this could mean a virtual warehouse spends less time on and more time in suspended mode, where you don’t have to pay for it. The alternative, recreating view definitions on a schedule, is pointless busywork given dbt Cloud jobs usually run only from the master branch and the only time the views could possibly change is when new code is merged to the master branch.

I will illustrate two alternative ways to trigger a dbt Cloud job to run on merge to master branch using CI tools, firstly with Github Actions and then with CircleCI.

The approach taken here could be adapted for use with other CI tools, and should work with Git repository hosts other than Github such as Gitlab or Bitbucket, although I haven’t tested any of those.

dbt Cloud

  1. Create or open an existing dbt Cloud job

Here you would normally set up a Schedule or Webhook. There is a section for “API”, but nothing needs to be done here, it just provides a link to the docs and a basic POST example. Take note of your account ID and job ID, at this point, those IDs should be obvious in your webbrowser’s URI.

  1. Get your dbt API key

Open your “My Account” settings and go to “API Access”, copy the API Key, you will need it later for either solution.

Github Actions

Use these instructions with Github Actions, for CircleCI skip to the next section

  1. Log in to GitHub and open your dbt repository

  2. Click on “Settings” and then “Secrets”

  3. Add a github secret called DBTCLOUDAPIKEY paste in your dbt cloud API key that you copied earlier

  4. Add a new github actions workflow

Contents of github/workflows/main.yml file:

  • You will need to replace the <GOES HERE> parts with your dbt account_id and job_id
name: Trigger dbt cloud job to prep views and build docs
on:
  push:
    branches: [ master ]

jobs:
  prep:
    runs-on: ubuntu-latest
    steps:
    - name: Run dbt cloud job
      run: |
        RESPONSE=$(curl -v -H "Authorization:Token ${{ secrets.DBTCLOUDAPIKEY }}" -H "Content-Type:application/json" -d '{"cause":"Triggered by Github Actions","git_branch":"master"}' https://cloud.getdbt.com/api/v2/accounts/<ACCOUNT_ID_GOES_HERE>/jobs/<JOB_ID_GOES_HERE>/run/)
        STATUSCODE=$(echo "$RESPONSE" | jq '.status.code')
        echo "$RESPONSE" | jq '.status.user_message'
        if [[ $STATUSCODE != 200 ]]; then echo "$RESPONSE" && bash -c "exit 1"; fi

  1. Ensure this is merged in to the master branch, the next time something is merged this workflow should run.

CircleCI version

These instructions assume you already have a CircleCI account and have it connected with your GitHub (or other) git host

  1. Log in to your CircleCI account
  2. Open “Projects”
  3. If it is not already enabled you may need to “Set Up Project” for your dbt git repo
  4. Add a project environment variable called DBTCLOUDAPIKEY containing your dbt API key copied earlier

Contents of the .circleci/config.yml file:

  • You will need to replace the <GOES HERE> parts with your dbt account_id and job_id
version: 2.1

workflows:
  version: 2
  dbtcloud:
    jobs:
      - trigger_dbt_cloud_job:
          filters:
            branches:
              only:
                - master

jobs:
  trigger_dbt_cloud_job:
    docker:
      - image: circleci/python:3.8.1
    steps:
      - run:
          name: curl
          command: |
            RESPONSE=$(curl -v -H "Authorization:Token ${DBTCLOUDAPIKEY}" -H "Content-Type:application/json" -d '{"cause":"Triggered by CircleCI", "git_branch":"master"}' https://cloud.getdbt.com/api/v2/accounts/<ACCOUNT_ID_GOES_HERE>/jobs/<JOB_ID_GOES_HERE>/run/)
            STATUSCODE=$(echo "$RESPONSE" | jq '.status.code')
            echo "$RESPONSE" | jq '.status.user_message'
            if [[ $STATUSCODE != 200 ]]; then echo "$RESPONSE" && bash -c "exit 1"; fi
  1. Ensure this is merged in to the master branch, the next time something is merged this pipeline should run

What are those bash commands?

The script sends a request to the dbt Cloud API to run a job , and stores the reponse. The status attribute is extracted from that response, which will log success!! if everything went well, or something else if not. It then checks to see if the response code is 200 (success) and if not will log the entire response and also force a non-zero exit code so that the entire CI workflow/pipeline will fail.

Caveats

The dbt API, Github Actions, CircleCI and other URIs mentioned here could change in the future, so refer to the relevant docs for latest up-to-date information.

The API key you use will be tied to your account, so it will be best currently for the dbt administrator to do this. It would be nice if dbt had a way to generate API Key for some service account user but I am not aware of any such functionality.

References

dbt Cloud docs - importing a project by git URL
dbt API docs

4 Likes

I’m wondering about the data params POSTed to the jobs end point. I don’t see those in the API docs anywhere. It’s good to know that the git branch can be set. Can I also override the database established in the environment?

If both those params are available, it would be possible to quickly spin up a test db with a random name in Snowflake, load the seeds, run the tests, then shut down and drop the db. Otherwise I’m not sure how we could achieve that.

In dbt Cloud, you have a project with an environment that has a connection. The connection defines a specific database so I don’t think you can do what you suggested by orchestrating dbt Cloud with CI.

You can certainly do it with a new schema instead of a database. That’s exactly how the Webhooks mode of a dbt job works. It will create a new schema and drop it for you when done.

If you use Webhooks - as opposed to a schedule or the API - it will be triggered when a PR is created. That earlier point in time is much more appropriate for running full end-to-end test. The purpose of what is outlined here is not for running tests, but really for minimizing the amount of time Snowflake is running; maximizing the amount of time spent suspended, saving time and money.

:+1: That should definitely go on to the road map, pretty sure we’re not the only ones not to want to have personal credentials in an automated workflow.

And thanks for writing that up, hooking Circleci in could make my life loads better.