First, allow me to acknowledge that this is overkill for most. I want my DBT docs to be accessible to everyone in my business, but not to everyone else. I could do this with Sinter of course, but 300 employees @ $20 per seat per month is a bit too rich for my blood.
@peter_hanssens wrote a great piece here about how to use Netlify to serve your docs. But to password protect that content runs you $9 per user per month. Less spendy than the native Sinter option, but still more than I want to spend.
My team already uses docker for a ton of stuff, and we already have infrastructure in AWS that runs docker containers, so I decided to go that route.
Dockerfile
FROM python:3.6
ARG user=someUser
ARG organization=yourGitHubOrg
ARG repo=yourDBTRepo
ARG homedir=/home/${user}
COPY entrypoint.sh ${homedir}
# Non-root group & user creation
RUN groupadd -r ${user}
RUN useradd -r -m -g ${user} ${user}
RUN mkdir ${homedir}/.ssh
# Git
ENV REMOTE_REPO git@github.com:${organization}/${repo}.git
ENV REPO_DIR ${homedir}/${repo}
RUN apt-get update && apt-get install -y git
COPY id_rsa ${homedir}/.ssh/
RUN ssh-keyscan github.com >> ${homedir}/.ssh/known_hosts
# BigQuery
ENV GOOGLE_APPLICATION_CREDENTIALS ${homedir}/service_account.json
COPY service_account.json ${homedir}
# Permissions!
RUN chmod 0700 ${homedir}/.ssh
RUN chmod 0600 ${homedir}/.ssh/id_rsa
RUN chmod 0644 ${homedir}/.ssh/known_hosts
RUN chmod 0755 ${homedir}/entrypoint.sh
RUN chown -R ${user}:${user} ${homedir}/.ssh
# DBT!
RUN pip install dbt==0.11.1
# Prep for container execution
USER ${user}
WORKDIR ${homedir}
ENTRYPOINT ["/bin/bash", "entrypoint.sh"]
Note: You may need to tweak this to enable a connection to your data warehouse as specified in your profiles.yml
file.
entrypoint.sh
#!/usr/bin/env bash
git clone $REMOTE_REPO
cd $REPO_DIR
dbt deps --profiles-dir .
dbt docs generate --target prod --profiles-dir .
dbt docs serve --profiles-dir . > /dev/null 2>&1 &
while [ True ]
do
sleep 600
if [ `git rev-parse --short HEAD` != `git rev-parse --short origin/master` ]; then
git fetch --all
git reset --hard origin/master
dbt deps --profiles-dir .
dbt docs generate --target prod --profiles-dir .
fi
done
Note: I have my profiles.yml
file in the root of my repo.
This will clone your dbt repo, install dependencies, generate docs, start the webserver in the background, then enter a loop where every 10 minutes it will check if there’s been a change to the code on github, and if so pull down the new code and regenerate the docs files.
Build the image from a location where your authentication files are present, so they can be copied into your image. I have a little EC2 instance for just this purpose, so I don’t have service account credentials proliferating onto a bunch of engineer laptops. From the desired folder, I run docker build --pull -t dbtdocs .
I can deploy that image to ECR if I want, but I run it from the image build machine (it’s idle most of the time anyway; might as well give it something else to do!) To start serving the site, I run docker run -d -p 8080:8080 --restart unless-stopped dbtdocs
Assuming your network settings allow it, now you can hit port 8080 on that machine and your docs site should be visible.
I went a step further, however, to SSL-ify the connection. I use a EC2 HTTPS load balancer (because it makes it easy to deal with certificates and whatnot) listening on port 443 (as expected) and it routes all traffic to the docker host’s 8080, where the docker host will then route it to the docker container’s 8080, where dbt docs generate
is serving the content. My security group settings in AWS disallow anything except 443 from outside (and only allowing 443 from office IP addresses), but allow 8080 on the internal network.