when you type dbt run in the terminal and hit ENTER, the first thing the terminal does is ask itself:
what is this
dbtcommand? where might I find the code that corresponds to it?
the $PATH most-traveled
enter the $PATH environment variable. All it is an array (list) of directories. It tells the terminal in which order to go looking for programs that correspond to commands. As soon as it finds an executable with the same name as the command invoked, it stops searching and just runs that program with the user-provided arguments and flags.
The purpose of $PATH is that it is modifiable to meet the needs of end users and tools.
an example with Python virtual environments
the easiest example for most dbt users would be Python virtual environments (venvs). There’s many ways to do this but it’s effectively the three steps below.
| step | action | venv command | what effectively happens |
|---|---|---|---|
| 1 | create a venv | venv . |
I) makes a new .venv/ dir in the current directory (.). II) installs Python, pip and cmd’s like activate and deactivate |
| 2 | activate a venv | source .venv/bin/activate |
modifies your terminal’s $PATH so that .venv/bin/ is searched before any other dir |
| 3 | install dbt Core | pip install dbt-core dbt-snowflake |
installs the python packages as executables into .venv/bin/ |
For me, it’s helpful to think of venvs as predominantly “automagic” of your $PATH variable
So often, I’ll start my day with a dbt debug without first activating my venv. The outcome, zsh: command not found: dbt, means
hey I can’t find
dbtanywhere in the dirs I normally look (i.e.$PATH)
a new era: the dbt Fusion engine
A hugely undersold benefit of the new Fusion engine being written in Rust and distributed as a pre-compiled binary is that we no longer need the Python toolchain for environments.
When I first began using dbt five years ago, I was already very familiar with Python package and environment management. But dang, did experience the below a lot after my manager told me to train up the rest of my team
ANDERS: hey coworker, here’s a new transformation framework that’s simple and elegant! All it is SQL SELECT statements and YAML. It’ll be a game changer for our team. Want to learn?
COWORKER: very cool! yeah let’s get started!
ANDERS: ok so let’s talk about Python virtual environments […] whoops you forgot to runsource .venv/bin/activate!
COWORKER: (aside) he did say dbt is just SQL and Python, right?
I really think the new Fusion CLI sets us up to solve this problem once and for all. Allow me to explain.
how to have Fusion work for you
Of course Fusion is still under heavy development, so for the short- and medium- term, you’ll still need dbt Core and/or the dbt Cloud CLI installed.
I’m here to tell you that you can make this happen with just two lines in your ~/.zshrc: one about $PATH and the other to set an alias.
There’s two real scenarios for using the dbt Fusion engine, whether you’re using it in conjunction with:
-
dbt Core
-
dbt Cloud CLI
Below is how to setup your ~/.zshrc for each.
Fusion and Core
Requirements
-
if I have a Python
venvactivated with dbt Core installed,dbtshould be dbt Coredbtfcan be the dbt Fusion CLI
-
if I haven’t yet activated my
venvwith dbt Core in itdbtshould be the dbt Fusion CLI
Example config
# ~/.zshrc
# PREFIX my $PATH variable $
# when looking any executables look in this order
# 1) dir for currently activated Python virtual env
# (typically `./.venv/` )
# 2) $HOME/.local/bin/ (where the Fusion CLI Rust binary is installed)
# 3) all the ther dirs added previously to $PATH
export PATH="$VIRTUAL_ENV/bin:$HOME/.local/bin:$PATH"
alias dbtf=/Users/dataders/.local/bin/dbt
Fusion and Cloud CLI
This is a similar to Fusion+Core, but instead of needing to manage a virtual environment’s directory of executables, it’s homebrew’s.
Requirements
if I have the dbt Cloud CLI installed
-
dbtis dbt Cloud CLI -
dbtfis dbt Fusion CLI
I wouldn’t recommend it, but if you really want to also have dbt Core in the mix too, I’d put $VIRTUAL_ENV at the beginning of $PATH so that you only use Core if a venv is activated.
Example config
# ~/.zshrc
# PREFIX my $PATH variable $
# when looking any executables look in this order
# 1) dir for currently activated Python virtual env
# (typically `./.venv/` )
# 2) $HOME/.local/bin/ (where the Fusion CLI Rust binary is installed)
# 3) all the ther dirs added previously to $PATH
export PATH="/opt/homebrew/bin:$HOME/.local/bin:$PATH"
alias dbtf=/Users/dataders/.local/bin/dbt
caveats
individual tools can’t know everything
The installation of these tools (brew, venv , and the Fusion CLI) will automatically modify your ~/.zshrc to make them immediately work out of the box.
If you were to install them for the first time one-after-another, your ~/.zshrc might look like the below.
what we have here is a rats-nest DAG of modifiactions to your path! It takes a pen and paper for me to determine the value of $PATH after just these four lines.
This is why I recommend that you lean in and own your $PATH so that you may own your destiny.
# add homebrew to path
export PATH="/opt/homebrew/bin:$PATH"
# added by dbt installer
export PATH="$PATH:/Users/dataders/.local/bin"
# ChatGPT and Stack Overflow will often recommend lines like this
export PATH="$VIRTUAL_ENV/bin:$PATH"
# i even found these lines even though it's been years since I've used it
# Conda stuff
export PATH="/Users/dataders/miniforge3/bin:$PATH"
the data tooling ecosystem is still very Python heavy
Ok so you may have figured out a way to get dbt Core and dbt Fusion working nicely without overhead, but I haven’t yet addressed two major missing pieces:
-
many data tools still depend upon not only Python but other languages, distributions, executables. For example, Dagster, Airflow, duckdb
-
how to go from “works on my machine” to “automate for my whole team”?
Adding a new tool / infrastructure should never be done “lightly”, however sometimes it’s worth the cost.
Jairus Martinez at Brooklyn Data made a Docker container for using Fusion in conjunction with not only dbt Core but also other data tools. Check it out the repo and accompanying blog post if you’re interested in something more all-encompassing.
That said, when dbt Fusion goes GA, it is my hope that this radically simplifies how we set up our machines for data work, on our own machines and others’
In Conclusion
- we’re in middle ground that requires many to have both dbt Core and the dbt Fusion CLI installed
- Python virtual environments are par for the course in data but Fusion is a simplifying force
- reowning the use of
$PATHandaliasin your~/.zshrcis the right move in the long-term - like anything, there’s always caveats