Tips and Tricks about working with dbt

#1

Working with dbt for a while, one can start to develop workflows that are really useful. It would be great to collect these somewhere, so that beginner dbt-ers can benefit from those experiences.

Here is my tip:

  • When I was really into R, I would use the beepr package all the time because then I could switch onto a new task without losing site of my priority. Many moons ago, I asked about a built-in dbt beep and Drew taught me the dbt run .. && say beep.

What is something you have in your workflow that might make it easier for someone to work in/around dbt?

#2

Nice! The one trick I really like involves using some bash magic to run all of the changed files on a branch.

Snippet:

dbt run --models $(git diff --name-only | grep '\.sql$' | awk -F '/' '{ print $NF }' | sed 's/\.sql$/+/g')

You can save this in your .bashrc with a function, eg:

function dbt_run_changed() {
    children=$1
    models=$(git diff --name-only | grep '\.sql$' | awk -F '/' '{ print $NF }' | sed "s/\.sql$/${children}/g" | tr '\n' ' ')
    echo "Running models: ${models}"
    dbt run --models $models
}

This function takes an optional argument, +, that will also run the children of the changed models!

Usage:

$ dbt_run_changed
$ dbt_run_changed +
4 Likes
#3

Here’s my hot tip!

I’m a huge fan of keeping my directories organized in a tree structure. Since I do client work, my tree structure ends up looking like:

.
└── fishtown
    β”œβ”€β”€ client
    β”‚   β”œβ”€β”€ stark-industries
    β”‚   β”‚   β”œβ”€β”€ stark-industries-dbt
    β”‚   β”‚   └── stark-industries-lookml
    β”‚   └── wayne-enterprises
    β”‚       └── wayne-enterprises-dbt
    β”œβ”€β”€ dbt
    └── packages
        β”œβ”€β”€ segment
        └── utils

This can be annoying to cd into the right folder, so I use the goto utility to set up aliases for my folders.
Then I can do things like:

$ goto stark
$ pwd
./fishtown/client/stark-industries/stark-industries-dbt

I also recently set up a bash alias to open GitHub for my current directory (not tested on GitLab! Sorry @emilie) – instructions here.

#4

Oh and another one…

Sometimes if my dbt run is resulting in an error, I like to cycle my logs/dbt.log file before trying to run it again, to make debugging easier.

I have a pretty rudimentary bash function that does this for me:

function cycle_logs() {
  suffix=$(date '+%Y-%m-%dT%H:%M:%S')
  mv -v logs/dbt.log logs/dbt.log.${suffix}
}
#5

I just set up goto! It’s great! +1 this tip!

#6

Drew, do you think this approach could be used in a CI framework to only build/test/deploy only the models changed by the commits being merged into master?

I’m imagining diffing the feature branch against master and only running those models (with @ and +) in production. Could speed up build/test/deploy times a lot by leaving out the models that aren’t affected.

#7

Cool idea! This would definitely be a good starting point, but it won’t hold up for changes to macros or dbt_project.yml, for instance. dbt doesn’t currently provide a rock-solid way of understanding which models are impacted by changes to a given set of files, but that’s certainly something it could do some day!