Background
dbt Cloud is an excellent service and great value, providing an easy way to manage scheduled dbt jobs. As well as scheduling, it allows for Webhooks to be used so that a job can be triggered to run on a pull request (PR) to a branch specified in your dbt Cloud deployment environment. Jobs triggered via the webhook will be run against a new schema in your datawarehouse that will be cleaned up at the end of the run. Great for testing, but what about running a real production job?
In addition to schedules and webhooks, dbt Cloud also provides an API for triggering runs, which you can use to run a job that will use the regular schema(s) you use for jobs.
Why?
The main reason you might want to do this is to minimize how often a job runs that consists entirely of views, or to generate the dbt docs. On a target warehouse like Snowflake, this could mean a virtual warehouse spends less time on and more time in suspended mode, where you don’t have to pay for it. The alternative, recreating view definitions on a schedule, is pointless busywork given dbt Cloud jobs usually run only from the master branch and the only time the views could possibly change is when new code is merged to the master branch.
I will illustrate two alternative ways to trigger a dbt Cloud job to run on merge to master branch using CI tools, firstly with Github Actions and then with CircleCI.
The approach taken here could be adapted for use with other CI tools, and should work with Git repository hosts other than Github such as Gitlab or Bitbucket, although I haven’t tested any of those.
dbt Cloud
- Create or open an existing dbt Cloud job
Here you would normally set up a Schedule or Webhook. There is a section for “API”, but nothing needs to be done here, it just provides a link to the docs and a basic POST example. Take note of your account ID and job ID, at this point, those IDs should be obvious in your webbrowser’s URI.
- Get your dbt API key
Open your “My Account” settings and go to “API Access”, copy the API Key, you will need it later for either solution.
Github Actions
Use these instructions with Github Actions, for CircleCI skip to the next section
-
Log in to GitHub and open your dbt repository
-
Click on “Settings” and then “Secrets”
-
Add a github secret called
DBTCLOUDAPIKEY
paste in your dbt cloud API key that you copied earlier -
Add a new github actions workflow
Contents of github/workflows/main.yml file:
- You will need to replace the
<GOES HERE>
parts with your dbt account_id and job_id
name: Trigger dbt cloud job to prep views and build docs
on:
push:
branches: [ master ]
jobs:
prep:
runs-on: ubuntu-latest
steps:
- name: Run dbt cloud job
run: |
RESPONSE=$(curl -v -H "Authorization:Token ${{ secrets.DBTCLOUDAPIKEY }}" -H "Content-Type:application/json" -d '{"cause":"Triggered by Github Actions","git_branch":"master"}' https://cloud.getdbt.com/api/v2/accounts/<ACCOUNT_ID_GOES_HERE>/jobs/<JOB_ID_GOES_HERE>/run/)
STATUSCODE=$(echo "$RESPONSE" | jq '.status.code')
echo "$RESPONSE" | jq '.status.user_message'
if [[ $STATUSCODE != 200 ]]; then echo "$RESPONSE" && bash -c "exit 1"; fi
- Ensure this is merged in to the master branch, the next time something is merged this workflow should run.
CircleCI version
These instructions assume you already have a CircleCI account and have it connected with your GitHub (or other) git host
- Log in to your CircleCI account
- Open “Projects”
- If it is not already enabled you may need to “Set Up Project” for your dbt git repo
- Add a project environment variable called
DBTCLOUDAPIKEY
containing your dbt API key copied earlier
Contents of the .circleci/config.yml
file:
- You will need to replace the
<GOES HERE>
parts with your dbt account_id and job_id
version: 2.1
workflows:
version: 2
dbtcloud:
jobs:
- trigger_dbt_cloud_job:
filters:
branches:
only:
- master
jobs:
trigger_dbt_cloud_job:
docker:
- image: circleci/python:3.8.1
steps:
- run:
name: curl
command: |
RESPONSE=$(curl -v -H "Authorization:Token ${DBTCLOUDAPIKEY}" -H "Content-Type:application/json" -d '{"cause":"Triggered by CircleCI", "git_branch":"master"}' https://cloud.getdbt.com/api/v2/accounts/<ACCOUNT_ID_GOES_HERE>/jobs/<JOB_ID_GOES_HERE>/run/)
STATUSCODE=$(echo "$RESPONSE" | jq '.status.code')
echo "$RESPONSE" | jq '.status.user_message'
if [[ $STATUSCODE != 200 ]]; then echo "$RESPONSE" && bash -c "exit 1"; fi
- Ensure this is merged in to the master branch, the next time something is merged this pipeline should run
What are those bash commands?
The script sends a request to the dbt Cloud API to run a job , and stores the reponse. The status
attribute is extracted from that response, which will log success!!
if everything went well, or something else if not. It then checks to see if the response code is 200 (success) and if not will log the entire response and also force a non-zero exit code so that the entire CI workflow/pipeline will fail.
Caveats
The dbt API, Github Actions, CircleCI and other URIs mentioned here could change in the future, so refer to the relevant docs for latest up-to-date information.
The API key you use will be tied to your account, so it will be best currently for the dbt administrator to do this. It would be nice if dbt had a way to generate API Key for some service account user but I am not aware of any such functionality.
References
dbt Cloud docs - importing a project by git URL
dbt API docs