when you type dbt run
in the terminal and hit ENTER
, the first thing the terminal does is ask itself:
what is this
dbt
command? where might I find the code that corresponds to it?
the $PATH
most-traveled
enter the $PATH
environment variable. All it is an array (list) of directories. It tells the terminal in which order to go looking for programs that correspond to commands. As soon as it finds an executable with the same name as the command invoked, it stops searching and just runs that program with the user-provided arguments and flags.
The purpose of $PATH
is that it is modifiable to meet the needs of end users and tools.
an example with Python virtual environments
the easiest example for most dbt users would be Python virtual environments (venv
s). There’s many ways to do this but it’s effectively the three steps below.
step | action | venv command | what effectively happens |
---|---|---|---|
1 | create a venv | venv . |
I) makes a new .venv/ dir in the current directory (. ). II) installs Python, pip and cmd’s like activate and deactivate |
2 | activate a venv | source .venv/bin/activate |
modifies your terminal’s $PATH so that .venv/bin/ is searched before any other dir |
3 | install dbt Core | pip install dbt-core dbt-snowflake |
installs the python packages as executables into .venv/bin/ |
For me, it’s helpful to think of venv
s as predominantly “automagic” of your $PATH
variable
So often, I’ll start my day with a dbt debug
without first activating my venv
. The outcome, zsh: command not found: dbt
, means
hey I can’t find
dbt
anywhere in the dirs I normally look (i.e.$PATH
)
a new era: the dbt Fusion engine
A hugely undersold benefit of the new Fusion engine being written in Rust and distributed as a pre-compiled binary is that we no longer need the Python toolchain for environments.
When I first began using dbt five years ago, I was already very familiar with Python package and environment management. But dang, did experience the below a lot after my manager told me to train up the rest of my team
ANDERS: hey coworker, here’s a new transformation framework that’s simple and elegant! All it is SQL SELECT statements and YAML. It’ll be a game changer for our team. Want to learn?
COWORKER: very cool! yeah let’s get started!
ANDERS: ok so let’s talk about Python virtual environments […] whoops you forgot to runsource .venv/bin/activate
!
COWORKER: (aside) he did say dbt is just SQL and Python, right?
I really think the new Fusion CLI sets us up to solve this problem once and for all. Allow me to explain.
how to have Fusion work for you
Of course Fusion is still under heavy development, so for the short- and medium- term, you’ll still need dbt Core and/or the dbt Cloud CLI installed.
I’m here to tell you that you can make this happen with just two lines in your ~/.zshrc
: one about $PATH
and the other to set an alias
.
There’s two real scenarios for using the dbt Fusion engine, whether you’re using it in conjunction with:
-
dbt Core
-
dbt Cloud CLI
Below is how to setup your ~/.zshrc
for each.
Fusion and Core
Requirements
-
if I have a Python
venv
activated with dbt Core installed,dbt
should be dbt Coredbtf
can be the dbt Fusion CLI
-
if I haven’t yet activated my
venv
with dbt Core in itdbt
should be the dbt Fusion CLI
Example config
# ~/.zshrc
# PREFIX my $PATH variable $
# when looking any executables look in this order
# 1) dir for currently activated Python virtual env
# (typically `./.venv/` )
# 2) $HOME/.local/bin/ (where the Fusion CLI Rust binary is installed)
# 3) all the ther dirs added previously to $PATH
export PATH="$VIRTUAL_ENV/bin:$HOME/.local/bin:$PATH"
alias dbtf=/Users/dataders/.local/bin/dbt
Fusion and Cloud CLI
This is a similar to Fusion+Core, but instead of needing to manage a virtual environment’s directory of executables, it’s homebrew
’s.
Requirements
if I have the dbt Cloud CLI installed
-
dbt
is dbt Cloud CLI -
dbtf
is dbt Fusion CLI
I wouldn’t recommend it, but if you really want to also have dbt Core in the mix too, I’d put $VIRTUAL_ENV
at the beginning of $PATH
so that you only use Core if a venv
is activated.
Example config
# ~/.zshrc
# PREFIX my $PATH variable $
# when looking any executables look in this order
# 1) dir for currently activated Python virtual env
# (typically `./.venv/` )
# 2) $HOME/.local/bin/ (where the Fusion CLI Rust binary is installed)
# 3) all the ther dirs added previously to $PATH
export PATH="/opt/homebrew/bin:$HOME/.local/bin:$PATH"
alias dbtf=/Users/dataders/.local/bin/dbt
caveats
individual tools can’t know everything
The installation of these tools (brew
, venv
, and the Fusion CLI) will automatically modify your ~/.zshrc
to make them immediately work out of the box.
If you were to install them for the first time one-after-another, your ~/.zshrc
might look like the below.
what we have here is a rats-nest DAG of modifiactions to your path! It takes a pen and paper for me to determine the value of $PATH
after just these four lines.
This is why I recommend that you lean in and own your $PATH
so that you may own your destiny.
# add homebrew to path
export PATH="/opt/homebrew/bin:$PATH"
# added by dbt installer
export PATH="$PATH:/Users/dataders/.local/bin"
# ChatGPT and Stack Overflow will often recommend lines like this
export PATH="$VIRTUAL_ENV/bin:$PATH"
# i even found these lines even though it's been years since I've used it
# Conda stuff
export PATH="/Users/dataders/miniforge3/bin:$PATH"
the data tooling ecosystem is still very Python heavy
Ok so you may have figured out a way to get dbt Core and dbt Fusion working nicely without overhead, but I haven’t yet addressed two major missing pieces:
-
many data tools still depend upon not only Python but other languages, distributions, executables. For example, Dagster, Airflow, duckdb
-
how to go from “works on my machine” to “automate for my whole team”?
Adding a new tool / infrastructure should never be done “lightly”, however sometimes it’s worth the cost.
Jairus Martinez at Brooklyn Data made a Docker container for using Fusion in conjunction with not only dbt Core but also other data tools. Check it out the repo and accompanying blog post if you’re interested in something more all-encompassing.
That said, when dbt Fusion goes GA, it is my hope that this radically simplifies how we set up our machines for data work, on our own machines and others’
In Conclusion
- we’re in middle ground that requires many to have both dbt Core and the dbt Fusion CLI installed
- Python virtual environments are par for the course in data but Fusion is a simplifying force
- reowning the use of
$PATH
andalias
in your~/.zshrc
is the right move in the long-term - like anything, there’s always caveats