This post is heavily inspired by work done by Alan from tails.com — thanks for sharing your approach on a dbt office hours!
What’s a blue/green deployment?
Blue-green deployment is a technique that reduces downtime and risk by running two identical production environments. Let’s take a website as an example — when you need to deploy a new version of your website, the new version is created in a separate environment. After the build is successful, traffic is routed to the new build. If anything goes wrong, you just don’t switch over the traffic!
In analytics, we can take a similar approach:
- Maintain two separate databases (or schemas!) — for our example we’re going to call them
stage
andprod
. - Whenever you perform your production run of dbt project, run it in the
stage
database - Once all your tests have passed, switch out the
stage
database for theprod
database.
Hey presto! No bad data creeping into prod
ever
Doing this on Snowflake
- Set up two new databases, and grant ownership of these databases to the role that runs your dbt project
create database stage;
grant ownership on database stage to role transformer;
create database prod;
grant ownership on database prod to role transformer;
- Update your dbt project to write to the
stage
database (this will mean updating either yourprofiles.yml
or your connection in dbt Cloud). - Add an operation to your project to swap the databases (more on this below).
# macros/swap_database.sql
{% macro swap_database() %}
{% set sql='alter database prod swap with stage' %}
{% do run_query(sql) %}
{{ log("database swapped", info=True) }}
{% endmacro %}
- Add a macro to your project to override the
ref
function (more on this below)
-- Macro to override ref and to render identifiers without a database.
{% macro ref(model_name) %}
{% do return(builtins.ref(model_name).include(database=false)) %}
{% endmacro %}
- Add the step to swap the databases to your deployment process — you might end up with steps like this:
$ dbt run
$ dbt test
$ dbt run-operation swap_database
That’s it! Now, if your run fails at any step, bad data won’t be promoted to prod
!
What’s this swap
syntax?
Snowflake has a handy tool to help us out here — the swap
syntax. It essentially performs two renames at once — you can find more info out here (cmd + f
for swap with
).
So by executing:
alter database prod swap with stage;
We’re essentially executing these two things at the exact same time, with zero downtime:
alter database prod rename to stage;
alter database stage rename to prod;
Why do I need to override ref
?
By default, dbt resolves any ref
function to a fully qualified object reference. This means that this:
select * from {{ ref('dim_customers') }}
Gets resolved to:
select * from stage.core.dim_customers
In Snowflake (and other databases), views are saved as queries. If the query has a reference to stage.core.dim_customers
, we could end up with errors like this after we’ve done the swap:
SQL compilation error: Schema 'STAGE.CORE' does not exist or not authorized.
Or worse, we could be selecting from an out of date view (i.e. the one that got “demoted” to the stage
database).
To get around this, we need to tell dbt: “well ackshullllly, don’t put the database name in there”. So this:
select * from {{ ref('dim_customers') }}
Should get resolved to this:
select * from core.dim_customers
Since this is a relative reference, as long as core.dim_customers
exists in this prod
database (which it should!), the view will work as expected.
To change the behavior of dbt, we can add our own ref
macro. Whenever you add your own macro with the same name as a dbt macro, dbt will honor your version (related: dbt ships with a project).
There’s a bit of magic to do with builtins here, but the important thing is this — use the code above to get dbt to not use the database name when resolving ref
functions.
Loom
Here’s a video of me walking through this code in a sample dbt project
Happy swap
-ing!