Onboarding w/o dbt Cloud

Intro

Central Question

What’s the order of operations for teaching someone dbt and having them set up their local development environment?

To me, it seems a bit of a catch 22.

Background

For all our new Analytic Engineers, there three main things to learn:

  1. dbt, and
  2. how to run dbt commands locally against the team’s db’s
  3. how to contribute to team dbt projects

For me, #3 is the easiest, but #1 and #2 are on the 5-8 on the difficulty scale for a junior data analyst, experience depending. For our team, we’re especially challenged because:

  • the databases we’re using aren’t supported in dbt Cloud (yet), and
  • our dev tooling set up for can be daunting and error-prone.

After months of work, our team has all our initial infrastructure set up for an analytics engineering team, our largest impediment is on-boarding new folks! Initially I trained 3 folks myself and it was very hands-on and very touch-and-go with bugs we’ve since ironed out. More recently we trained our data science team on how contribute. The amount of time required went down, but the time spent on environment set up was equivalent to the time was still focused on actually learning dbt.

Over the next few months, we’ll have up to six people joining. I’m looking for a happy path for on-boarding that is well-documented and mostly self-guided.

Current Paths

dbt first (via dbt fundamentals)

the new joiner:

  1. makes dbt Cloud trial account, Big Query project and GitHub repo,
  2. completes the dbt Fundamentals course
  3. sets up local dev env according to instructions
  4. plays with jaffle_shop project and our own dbt-msft-specific dbt training on our dbs and their dev env
  5. gets access to our actual DW and onboads on the our dbt project

dev env first

the new joiner:

  1. sets up local dev env according to instructions
  2. follows our own dbt-msft-specific dbt training
  3. continue training with the original dbt CLI tutorial using a test db our ours and their dev env
  4. gets access to our actual DW and onboads on the our dbt project

comparison

First Step Pros Cons
learn dbt Focused on dbt, without distraction of set up and less than 100% aligned UI Upfront work of dbt Cloud GitHub and Big Query set up
dev env set up Never have to touch BigQuery or dbt Cloud or even GitHub a lot of upfront work

dev environment set-up

Here’s our current iteration of an developer environment set up. I’m sure paginating it would make it less overwhelming, but there’s A LOT of set up! VSCode, Git, Anaconda, MSFT ODBC Driver, Azure Data Studio, the Azure CLI, lions, tigers, bears, etc.

While @claire’s classic, How we set up our computers for working on dbt projects, was def the start of the conversation, I see an increasing amount of chatter in this space in the past few weeks, namely:

improving the workflow

as @aescay mentions, there is certainly opportunity for improvement w.r.t automation. To me, I almost see two use cases:

  1. a simple environment (or deployment template) that is already created for the purpose of being able to focus on #1 and #2 that would consist of:
    1. a VM/container w/ VSCode and tools pre-installed (or use VSCode server to remote into said VM/container)
    2. a test db that is empty
    3. a script to create the necessary source tables
  2. the actual environment the dev would use day-to-day after learning the ropes

We’re going to work on this, but would love y’all’s input! Where can we make improvements?

1 Like