Fusion: own your `$PATH`, choose your destiny

when you type dbt run in the terminal and hit ENTER, the first thing the terminal does is ask itself:

what is this dbt command? where might I find the code that corresponds to it?

the $PATH most-traveled

enter the $PATH environment variable. All it is an array (list) of directories. It tells the terminal in which order to go looking for programs that correspond to commands. As soon as it finds an executable with the same name as the command invoked, it stops searching and just runs that program with the user-provided arguments and flags.

The purpose of $PATH is that it is modifiable to meet the needs of end users and tools.

an example with Python virtual environments

the easiest example for most dbt users would be Python virtual environments (venvs). There’s many ways to do this but it’s effectively the three steps below.

step action venv command what effectively happens
1 create a venv venv . I) makes a new .venv/ dir in the current directory (.). II) installs Python, pip and cmd’s like activate and deactivate
2 activate a venv source .venv/bin/activate modifies your terminal’s $PATH so that .venv/bin/ is searched before any other dir
3 install dbt Core pip install dbt-core dbt-snowflake installs the python packages as executables into .venv/bin/

For me, it’s helpful to think of venvs as predominantly “automagic” of your $PATH variable

So often, I’ll start my day with a dbt debug without first activating my venv. The outcome, zsh: command not found: dbt, means

hey I can’t find dbt anywhere in the dirs I normally look (i.e. $PATH )

a new era: the dbt Fusion engine

A hugely undersold benefit of the new Fusion engine being written in Rust and distributed as a pre-compiled binary is that we no longer need the Python toolchain for environments.

When I first began using dbt five years ago, I was already very familiar with Python package and environment management. But dang, did experience the below a lot after my manager told me to train up the rest of my team

ANDERS: hey coworker, here’s a new transformation framework that’s simple and elegant! All it is SQL SELECT statements and YAML. It’ll be a game changer for our team. Want to learn?
COWORKER: very cool! yeah let’s get started!
ANDERS: ok so let’s talk about Python virtual environments […] whoops you forgot to run source .venv/bin/activate !
COWORKER: (aside) he did say dbt is just SQL and Python, right?

I really think the new Fusion CLI sets us up to solve this problem once and for all. Allow me to explain.

how to have Fusion work for you

Of course Fusion is still under heavy development, so for the short- and medium- term, you’ll still need dbt Core and/or the dbt Cloud CLI installed.

I’m here to tell you that you can make this happen with just two lines in your ~/.zshrc: one about $PATH and the other to set an alias.

There’s two real scenarios for using the dbt Fusion engine, whether you’re using it in conjunction with:

  1. dbt Core

  2. dbt Cloud CLI

Below is how to setup your ~/.zshrc for each.

Fusion and Core

Requirements

  1. if I have a Python venv activated with dbt Core installed,

    1. dbt should be dbt Core
    2. dbtf can be the dbt Fusion CLI
  2. if I haven’t yet activated my venv with dbt Core in it

    1. dbt should be the dbt Fusion CLI

Example config


# ~/.zshrc

# PREFIX my $PATH variable $

# when looking any executables look in this order

# 1) dir for currently activated Python virtual env

# (typically `./.venv/` )

# 2) $HOME/.local/bin/ (where the Fusion CLI Rust binary is installed)

# 3) all the ther dirs added previously to $PATH

export PATH="$VIRTUAL_ENV/bin:$HOME/.local/bin:$PATH"

alias dbtf=/Users/dataders/.local/bin/dbt

Fusion and Cloud CLI

This is a similar to Fusion+Core, but instead of needing to manage a virtual environment’s directory of executables, it’s homebrew’s.

Requirements

if I have the dbt Cloud CLI installed

  1. dbt is dbt Cloud CLI

  2. dbtf is dbt Fusion CLI

I wouldn’t recommend it, but if you really want to also have dbt Core in the mix too, I’d put $VIRTUAL_ENV at the beginning of $PATH so that you only use Core if a venv is activated.

Example config


# ~/.zshrc

# PREFIX my $PATH variable $

# when looking any executables look in this order

# 1) dir for currently activated Python virtual env

# (typically `./.venv/` )

# 2) $HOME/.local/bin/ (where the Fusion CLI Rust binary is installed)

# 3) all the ther dirs added previously to $PATH

export PATH="/opt/homebrew/bin:$HOME/.local/bin:$PATH"

alias dbtf=/Users/dataders/.local/bin/dbt

caveats

individual tools can’t know everything

The installation of these tools (brew, venv , and the Fusion CLI) will automatically modify your ~/.zshrc to make them immediately work out of the box.

If you were to install them for the first time one-after-another, your ~/.zshrc might look like the below.

what we have here is a rats-nest DAG of modifiactions to your path! It takes a pen and paper for me to determine the value of $PATH after just these four lines.

This is why I recommend that you lean in and own your $PATH so that you may own your destiny.


# add homebrew to path

export PATH="/opt/homebrew/bin:$PATH"

# added by dbt installer

export PATH="$PATH:/Users/dataders/.local/bin"

# ChatGPT and Stack Overflow will often recommend lines like this

export PATH="$VIRTUAL_ENV/bin:$PATH"

# i even found these lines even though it's been years since I've used it

# Conda stuff

export PATH="/Users/dataders/miniforge3/bin:$PATH"

the data tooling ecosystem is still very Python heavy

Ok so you may have figured out a way to get dbt Core and dbt Fusion working nicely without overhead, but I haven’t yet addressed two major missing pieces:

  1. many data tools still depend upon not only Python but other languages, distributions, executables. For example, Dagster, Airflow, duckdb

  2. how to go from “works on my machine” to “automate for my whole team”?

Adding a new tool / infrastructure should never be done “lightly”, however sometimes it’s worth the cost.

Jairus Martinez at Brooklyn Data made a Docker container for using Fusion in conjunction with not only dbt Core but also other data tools. Check it out the repo and accompanying blog post if you’re interested in something more all-encompassing.

That said, when dbt Fusion goes GA, it is my hope that this radically simplifies how we set up our machines for data work, on our own machines and others’

In Conclusion

  • we’re in middle ground that requires many to have both dbt Core and the dbt Fusion CLI installed
  • Python virtual environments are par for the course in data but Fusion is a simplifying force
  • reowning the use of $PATH and alias in your ~/.zshrc is the right move in the long-term
  • like anything, there’s always caveats
2 Likes