Intro
Central Question
What’s the order of operations for teaching someone dbt and having them set up their local development environment?
To me, it seems a bit of a catch 22.
Background
For all our new Analytic Engineers, there three main things to learn:
- dbt, and
- how to run dbt commands locally against the team’s db’s
- how to contribute to team dbt projects
For me, #3 is the easiest, but #1 and #2 are on the 5-8 on the difficulty scale for a junior data analyst, experience depending. For our team, we’re especially challenged because:
- the databases we’re using aren’t supported in dbt Cloud (yet), and
- our dev tooling set up for can be daunting and error-prone.
After months of work, our team has all our initial infrastructure set up for an analytics engineering team, our largest impediment is on-boarding new folks! Initially I trained 3 folks myself and it was very hands-on and very touch-and-go with bugs we’ve since ironed out. More recently we trained our data science team on how contribute. The amount of time required went down, but the time spent on environment set up was equivalent to the time was still focused on actually learning dbt.
Over the next few months, we’ll have up to six people joining. I’m looking for a happy path for on-boarding that is well-documented and mostly self-guided.
Current Paths
dbt first (via dbt fundamentals)
the new joiner:
- makes dbt Cloud trial account, Big Query project and GitHub repo,
- completes the dbt Fundamentals course
- sets up local dev env according to instructions
- plays with jaffle_shop project and our own dbt-msft-specific dbt training on our dbs and their dev env
- gets access to our actual DW and onboads on the our dbt project
dev env first
the new joiner:
- sets up local dev env according to instructions
- follows our own dbt-msft-specific dbt training
- continue training with the original dbt CLI tutorial using a test db our ours and their dev env
- gets access to our actual DW and onboads on the our dbt project
comparison
First Step | Pros | Cons |
---|---|---|
learn dbt | Focused on dbt, without distraction of set up and less than 100% aligned UI | Upfront work of dbt Cloud GitHub and Big Query set up |
dev env set up | Never have to touch BigQuery or dbt Cloud or even GitHub | a lot of upfront work |
dev environment set-up
Here’s our current iteration of an developer environment set up. I’m sure paginating it would make it less overwhelming, but there’s A LOT of set up! VSCode, Git, Anaconda, MSFT ODBC Driver, Azure Data Studio, the Azure CLI, lions, tigers, bears, etc.
While @claire’s classic, How we set up our computers for working on dbt projects, was def the start of the conversation, I see an increasing amount of chatter in this space in the past few weeks, namely:
- @aescay’s Setting up your local dbt run environments, and
- @gnilrets’s A containerized dbt environment for your team
improving the workflow
as @aescay mentions, there is certainly opportunity for improvement w.r.t automation. To me, I almost see two use cases:
- a simple environment (or deployment template) that is already created for the purpose of being able to focus on #1 and #2 that would consist of:
- a VM/container w/ VSCode and tools pre-installed (or use VSCode server to remote into said VM/container)
- a test db that is empty
- a script to create the necessary source tables
- the actual environment the dev would use day-to-day after learning the ropes
We’re going to work on this, but would love y’all’s input! Where can we make improvements?