A containerized dbt environment for your team

Getting a new member of your team set up using dbt can be a challenge. Did you install the same version of Python, dbt? Is it working on your machine but not theirs? It can also be difficult to ensure everyone on your team keeps their dbt environment up to date. This is a common challenge with most software development environments. Fortunately, there is a way to reduce a lot of this pain. It involves running your dbt environment in a docker container which includes an explicit, tested recipe for setting up dbt with all the right dependencies.

I’ve found myself setting up containerized docker environments several times. To make this easier in the future, I set up a dbt container skeleton that can be used to bootstrap a manageable, secure, and containerized dbt development environment. Once it’s initially configured, updating your environment is as simple as

inv build

and running dbt code is just

inv dbt-shell
$ dbt run

See the main dbt container skeleton repo for details.


@gnilrets this is awesome. I’m glad to see others working in the same problem space. Our team is growing quickly and dev env set up is probably our largest impediment right now?

Follow up question:

Would you recommend this process for all new team members?

I ask because this solution seems to ensure a stable environment at the expense of initial set up and overall complexity. Is a stable environment what you see new users struggle with the most. Does this solution assume that the new team members are already familiar with anaconda environments?

Even teaching anaconda to someone with only Data Viz and SQL experience is quite a heavy lift already. It was validating too see @aescay’s post last week rationalizing virtualenv over conda envs:

Please don’t read this as critical, I’m just struggling a lot with happy path to ohelping new team members get set up right now. Perhaps we can set up a working group for figuring this out!

Hi @data_ders, glad you liked this! No criticism taken (although accepted if needed).

Unless your team uses dbt-cloud 100%, I would recommend this process (or one like it) for all team members and projects.

I would also argue that this process is actually supposed to make initial set up easier, rather than being an expense. Before I started using containerized development environments (on non-dbt projects), I would spend half a day with a new team member just getting their local environment set up with the right version of python, pip packages, homebrew recipes, etc. There was always a few things that worked on my machine a few months ago that no longer works with more recent packages, and that only gets worse as the project complexity and dependencies grow. And later, making any sort of upgrades to our environment was just as difficult, and we would rarely be willing to go through the pain of upgrading our packages and end up missing out on nice new features or important bug fixes, and our code would just end up rotting.

I personally don’t think setting up miniconda is too big of a lift. After the initial setup, there’s only one command you have know: conda activate myenv. But I can see how virtualenv might be slightly simpler, especially if you’re only going to be managing a single dbt project.

1 Like

Do you currently walk new users through the set up described in the repo? Or do you ask them to do it independently? Do we assume that new users know how to open up a shell into a Docker container?

Also, dumb question – is the idea that the container is hosted on same host as the database itself, to have sort of a remote development environment? Or is this just meant to build a layer of abstraction for end users’ local set up?

Again asking from the perspective of someone also struggling with helping new team members onboard. I wrote a thing about our pain points yesterday

Last thing –

We don’t have any homebrew recipes, so for us, conda does all this for us. The big challenge I find is helping ensure that the conda env will activate for users automatically when they open the project in VSCode. Fortunately now the Python extension auto-activates if there’s a requirements.txt file in the repo. The challenge is helping new users to set their python.PythonPath variable in the settings.json

Some new users are able to follow directions in our repo README.md without any help, which is pretty similar to what is in the dbt-container-skeleton. Others who are less familiar with working on the command line may need some handholding at first. Most of the complexity is meant to wrapped up into simple invoke tasks (via tasks.py).

This setup is meant to be for local development, so the container is built and run on the developer’s laptop. (Since the image is built on the developer’s machine, there is a risk that the build process could differ slightly from dev to dev – if that becomes an issue, then it would be prudent to build a workflow that involved hosting a main image on dockerhub and share that with your team, but I haven’t found that to be really necessary).

@data_ders I think another important caveat to note with our current solution, which I failed to mention in the main post, was that our workflow centers mostly around data model development and we rarely delve into custom python workflows and scripts. That said, for our team, a light environment handler did the trick because it was rare that we had python (and other programming) dependencies floating around and in flux in our local machines. However, prior to working at Fishtown I was previously in a broader data team and we had a lot of Python and R projects that we were working on simultaneously. In that sort of environment, which is common for a lot of data teams whose resources are pulled in all different directions, I would highly recommend using something like conda (because it is more robust and worth the effort of having fully discrete Python environments), or even dockerizing environments (which gives you full flexibility and isolation, and was what we were using in our organization). Hope this helps you figure out the best environment setup for your team!

Updates! The container skeleton now includes dtspec testing, SQLFluff linting, and a minimal GitHub Actions setup for CI!