Setting up your local dbt run environments

As an analytics engineer on Fishtown’s professional services team, I spent the past year working on 10 different dbt projects for customers. I wrote the following guide to help our team easily switch back and forth between dbt versions. There were times when I was working on 2-4 different projects at once and it was cumbersome to operate in an “all or nothing mentality” regarding which dbt version I could work on. I also wanted to help make it easier to test out beta releases of dbt without having to fully commit to using the beta for every project I was actively working on.

I wanted to share this document with you all since it might help your team if you’re managing multiple dbt projects as well, or you’d like to easily test beta releases and keep up with the latest and greatest that dbt has to offer!

TLDR

  • If you’ve ever wanted to get multiple versions of dbt running in your machine (specifically the latest release and maybe the latest pre-release - beta or rc) then you’ve come to the right place!

  • This guide will help you get set up with 2 virtual environments (using venv):

    1. dbt - latest official release
    2. dbt-beta - latest pre release
  • This will help your team:

    1. standardize development experience - no more “this works on mine but idk why it doesn’t work on yours” problem
    2. easily upgrade and manage local dbt versions - this should make it easier to both beta test new releases and easily swap back to existing stable releases when bugs are discovered

Why should I bother?

What are Virtual Environments?

Real Python does a great job at explaining what virtual environments are and why we need them in this primer (feel free to loop back to this later):

Python Virtual Environments: A Primer - Real Python

For now, the most important takeaway here is the concept of project isolation. Being able to isolate your projects allows you to install and handle multiple, potentially conflicting, package versions in one machine at the same time.

How will using Virtual Environments help your team?

In the same vein as having code style guides, organizing project directories consistently (using a ~/dev/ directory), and version controlling analytics, this is just another way to aid in improving the developer experience and preventing unexpected code breakage. Here are a couple ways it could benefit your team:

  1. Everyone will have the exact same local installation of dbt, in an isolated environment, guaranteeing consistency across all your machines. No more “I don’t understand why it doesn’t work on your machine, it works on mine :man_shrugging:
  2. You can now easily hot swap between dbt versions. If you want to run the latest stable release, you can do so, if you want to easily try the same project on beta, you could do so also! All this with much fewer commands than those required to upgrade or downgrade dbt.
  3. We can have a script manage your environments and make updating easy! If everyone has the same environments set up, it makes it really easy for people to build scripts to help manage that across all your machines. (And this is what we did here!) No need to remember to upgrade both the dbt stable release and also the beta, just run a simple command (like dbt-update) and voila, you have both environments updated!

Why venv? Why not pyenv-virtualenv or conda?

While it’s true that the other two alternatives (and many more) are likely better, all other alternatives require installing another package whereas venv comes shipped with the latest versions of Python out of the box. This makes it easy for us to use this across all your machines without having to worry about installing other bloatware that might not be useful to us. Keeping this as simple as possible will also help maintainability!

Count me in! Where do I get started?

If you’re convinced this is the right solution, here’s a quick guide to get you set up!

  1. Copy the contents of this gist:

    • dbt-update.sh

      #!/usr/bin/env bash
      
      DBT_ENV=~/.virtualenvs/dbt
      DBT_BETA_ENV=~/.virtualenvs/dbt-beta
      
      process_environment() {
        env=$1
        release=$2
      
        if [[ -d "$env" ]]; then
          echo ""
          echo "There is an existing dbt environment in: $env"
          echo -n "Would you like to update(u) or reinstall(r)? [u/r]: "
          read ans
      
          if [[ $ans == "r" ]]; then
            echo ""
            echo "Reinstalling!"
            echo "Deleting existing environment"
            rm -rf $env && echo "Successfully deleted existing environment"
          elif [[ $ans = "u" ]]; then
            echo "Updating!"
          else
            echo ""
            echo "Exiting script"
            exit 1
          fi
      
        fi
      
        echo ""
        echo "Creating dbt environment in: $env"
        python3 -m venv $env && echo "Successfully created dbt environment!"
      
        echo ""
        echo "Activating your dbt environment and installing the latest dbt version"
      
        if [[ $release == "stable" ]]; then
          source $env/bin/activate && pip install dbt -q -U && echo "Successfully installed dbt:"
        elif [[ $release == "pre" ]]; then
          source $env/bin/activate && pip install dbt -q -U --pre && echo "Successfully installed dbt:"
        else
          exit 1
        fi
      
        dbt --version
        deactivate
      
      }
      
      echo ""
      echo "Initializing dbt environments"
      
      echo ""
      echo "=== main dbt environment ==="
      
      process_environment $DBT_ENV stable
      
      echo ""
      echo "=== dbt beta environment ==="
      
      process_environment $DBT_BETA_ENV pre
      
      echo ""
      echo "If you would like to get the commands to set your aliases"
      echo -n "to easily activate the environments respond with your shell or skip, [bash/zsh/skip]:"
      read ans_alias
      
      if [[ $ans_alias == "bash" ]]; then
        profile=~/.bash_profile
      elif [[ $release == "zsh" ]]; then
        profile=~/.zshrc
      else
        exit 1
      fi
      
      echo ""
      echo "If you don't already have these aliases to quickly activate the dbt environments"
      echo "run the following commands in your terminal then restart your terminal:"
      echo "echo \"alias dbt-activate='source $DBT_ENV/bin/activate'\" >> $profile"
      echo "echo \"alias dbt-beta-activate='source $DBT_BETA_ENV/bin/activate'\" >> $profile"
      
  2. Paste the contents in a file in your ~/.dbt/ directory called: dbt-update.sh. This is the same folder where you’d find your profiles.yml file!

  3. Run the following commands in your terminal:

    You can figure out your shell by checking the top of your iTerm2 window to see what’s written, should be either zsh or bash

    • If you’re using zsh (the new macOS default since macOS Catalina)

      chmod +x ~/.dbt/dbt-update.sh # this makes the file executable
      echo "alias dbt-update='~/.dbt/dbt-update.sh'" >> ~/.zshrc # this allows dbt-update to be run from anywhere
      
    • If you’re using bash (disclaimer, I haven’t tested this, let me know if you’re trying it, would love to pair)

      chmod +x ~/.dbt/dbt-update.sh # this makes the file executable
      printf "\nalias dbt-update='~/.dbt/dbt-update.sh'" >> ~/.bash_profile # this allows dbt-update to be run from anywhere
      
  4. Restart your terminal by closing the window and opening a new one (you could also run the source command against your config file)

  5. Run the following command:

    dbt-update # yes it's that simple!
    

    You should see some printed status messages letting you know what’s going on underneath. The script will now proceed to set up the dbt and dbt-beta environments for the first time, as well as install fresh dbt versions in each environment. At the end of each install, you should see the script print out the version of dbt that was installed. It should look something similar to this:

  6. Once your environment is set up, run the two commands that the script suggests you run in your terminal after it’s been installed.

  7. Restart your terminal one last time and you now should be able to run:

    dbt-activate
    

    and you will get something that looks like this:
    Untitled 1

    After running the activate command, you should be able to see your terminal prefixed with the environment name (in this case dbt). You can do the same for the dbt-beta environment, just run:

    dbt-beta-activate
    
  8. To deactivate your environments, just run:

    deactivate
    
  9. The next time you need to update dbt, just re run: dbt-update. It should detect that you already have the environments set up. It will then prompt you to choose whether you’d like to update or reinstall (type any other character to cancel out).

Improving this workflow

I’m looking to see if anyone is familiar with how this could be deployed in a smoother way? Right now it requires that people have some familiarity with creating custom bash scripts and setting up aliases on their command line tool of choice. I would love to see some way to just distribute this across multiple machines with some simple install and maybe add some sort of flexibility so that users could potentially manage versions by just using some configurable options, instead of making edits to the raw dbt-update.sh file. Feel free to reach out here and write some thoughts if this is interesting to you or if you’ve tackled the same problem differently in your organization!

Other Helpful Links

An Effective Python Environment: Making Yourself at Home - Real Python

7 Likes

I really like this approach @aescay, particularly that venv does not require installing another package. Thanks for sharing!

I’ve been working on establishing a consistent development environment for our Analytics Engineering team at Surfline as well. While venv is the simplest solution, we’ve settled on conda over venv, pyenv-virtualenv, poetry, and others for a few reasons:

  1. Analytics Engineering falls under our Engineering team and they use conda.
  2. Our Data Science team also uses conda. So all of our technical teams company wide use the same solution.
  3. Other projects we work on at times require non-python packages (e.g., gcc) and conda really excels here.

It seems to me that knowing what the other technical teams in the org are using as a virtual environment manager is likely the best indicator of what Analytics Engineering should use.

In terms of “improving this workflow”, I’d suggest checking out direnv. We use it so that anytime we navigate inside a repo with a conda environment, the conda env is automatically activated and when you navigate outside the repo it is automatically deactivated. This guards us against installs in the wrong environment (:scream:) and makes switching between projects super easy!

I imagine the same can be achieved with venv enviroments!

2 Likes