Onboarding w/o dbt Cloud

data_ders · April 6, 2021, 7:18pm

Intro

Central Question

What’s the order of operations for teaching someone dbt and having them set up their local development environment?

To me, it seems a bit of a catch 22.

Background

For all our new Analytic Engineers, there three main things to learn:

dbt, and
how to run dbt commands locally against the team’s db’s
how to contribute to team dbt projects

For me, #3 is the easiest, but #1 and #2 are on the 5-8 on the difficulty scale for a junior data analyst, experience depending. For our team, we’re especially challenged because:

the databases we’re using aren’t supported in dbt Cloud (yet), and
our dev tooling set up for can be daunting and error-prone.

After months of work, our team has all our initial infrastructure set up for an analytics engineering team, our largest impediment is on-boarding new folks! Initially I trained 3 folks myself and it was very hands-on and very touch-and-go with bugs we’ve since ironed out. More recently we trained our data science team on how contribute. The amount of time required went down, but the time spent on environment set up was equivalent to the time was still focused on actually learning dbt.

Over the next few months, we’ll have up to six people joining. I’m looking for a happy path for on-boarding that is well-documented and mostly self-guided.

Current Paths

dbt first (via dbt fundamentals)

the new joiner:

makes dbt Cloud trial account, Big Query project and GitHub repo,
completes the dbt Fundamentals course
sets up local dev env according to instructions
plays with jaffle_shop project and our own dbt-msft-specific dbt training on our dbs and their dev env
gets access to our actual DW and onboads on the our dbt project

dev env first

the new joiner:

sets up local dev env according to instructions
follows our own dbt-msft-specific dbt training
continue training with the original dbt CLI tutorial using a test db our ours and their dev env
gets access to our actual DW and onboads on the our dbt project

comparison

First Step	Pros	Cons
learn dbt	Focused on dbt, without distraction of set up and less than 100% aligned UI	Upfront work of dbt Cloud GitHub and Big Query set up
dev env set up	Never have to touch BigQuery or dbt Cloud or even GitHub	a lot of upfront work

dev environment set-up

Here’s our current iteration of an developer environment set up. I’m sure paginating it would make it less overwhelming, but there’s A LOT of set up! VSCode, Git, Anaconda, MSFT ODBC Driver, Azure Data Studio, the Azure CLI, lions, tigers, bears, etc.

While @claire’s classic, How we set up our computers for working on dbt projects, was def the start of the conversation, I see an increasing amount of chatter in this space in the past few weeks, namely:

improving the workflow

as @aescay mentions, there is certainly opportunity for improvement w.r.t automation. To me, I almost see two use cases:

a simple environment (or deployment template) that is already created for the purpose of being able to focus on #1 and #2 that would consist of:
1. a VM/container w/ VSCode and tools pre-installed (or use VSCode server to remote into said VM/container)
2. a test db that is empty
3. a script to create the necessary source tables
the actual environment the dev would use day-to-day after learning the ropes

We’re going to work on this, but would love y’all’s input! Where can we make improvements?

Topic		Replies	Views
A containerized dbt environment for your team Show and Tell docker , cli , developer-ergonomics	8	15669	January 17, 2023
How to proceed with dbt Fundamentals course as individual without data platform access Help dbt-cloud	1	14	July 20, 2025
dbt CLI vs dbt Cloud - convince me! Archive	5	9202	March 30, 2022
Setting up your local dbt run environments Show and Tell tools , cli , developer-ergonomics	3	16525	August 25, 2023
dbt Squared: Leveraging dbt Core and dbt Cloud together at scale In-Depth Discussions devblog	2	1662	October 4, 2023