Performing a blue/green deploy of your dbt project on Snowflake

This post is heavily inspired by work done by Alan from tails.com — thanks for sharing your approach on a dbt office hours!

What’s a blue/green deployment?

Blue-green deployment is a technique that reduces downtime and risk by running two identical production environments. Let’s take a website as an example — when you need to deploy a new version of your website, the new version is created in a separate environment. After the build is successful, traffic is routed to the new build. If anything goes wrong, you just don’t switch over the traffic! :boom:

In analytics, we can take a similar approach:

  • Maintain two separate databases (or schemas!) — for our example we’re going to call them stage and prod.
  • Whenever you perform your production run of dbt project, run it in the stage database
  • Once all your tests have passed, switch out the stage database for the prod database.

Hey presto! No bad data creeping into prod ever :smiley:

Doing this on Snowflake

  1. Set up two new databases, and grant ownership of these databases to the role that runs your dbt project
create database stage;
grant ownership on database stage to role transformer;

create database prod;
grant ownership on database prod to role transformer;

  1. Update your dbt project to write to the stage database (this will mean updating either your profiles.yml or your connection in dbt Cloud).
  2. Add an operation to your project to swap the databases (more on this below).
# macros/swap_database.sql
{% macro swap_database() %}

    {% set sql='alter database prod swap with stage' %}
    {% do run_query(sql) %}
    {{ log("database swapped", info=True) }}

{% endmacro %}
  1. Add a macro to your project to override the ref function (more on this below)
-- Macro to override ref and to render identifiers without a database.

{% macro ref(model_name) %}

  {% do return(builtins.ref(model_name).include(database=false)) %}

{% endmacro %}

  1. Add the step to swap the databases to your deployment process — you might end up with steps like this:
$ dbt run
$ dbt test
$ dbt run-operation swap_database

That’s it! Now, if your run fails at any step, bad data won’t be promoted to prod!

What’s this swap syntax?

Snowflake has a handy tool to help us out here — the swap syntax. It essentially performs two renames at once — you can find more info out here (cmd + f for swap with).

So by executing:

alter database prod swap with stage;

We’re essentially executing these two things at the exact same time, with zero downtime:

alter database prod rename to stage;
alter database stage rename to prod;

Why do I need to override ref?

By default, dbt resolves any ref function to a fully qualified object reference. This means that this:

select * from {{ ref('dim_customers') }}

Gets resolved to:

select * from stage.core.dim_customers

In Snowflake (and other databases), views are saved as queries. If the query has a reference to stage.core.dim_customers, we could end up with errors like this after we’ve done the swap:

SQL compilation error: Schema 'STAGE.CORE' does not exist or not authorized.

Or worse, we could be selecting from an out of date view (i.e. the one that got “demoted” to the stage database).

To get around this, we need to tell dbt: “well ackshullllly, don’t put the database name in there”. So this:

select * from {{ ref('dim_customers') }}

Should get resolved to this:

select * from core.dim_customers

Since this is a relative reference, as long as core.dim_customers exists in this prod database (which it should!), the view will work as expected.

To change the behavior of dbt, we can add our own ref macro. Whenever you add your own macro with the same name as a dbt macro, dbt will honor your version (related: dbt ships with a project).

There’s a bit of magic to do with builtins here, but the important thing is this — use the code above to get dbt to not use the database name when resolving ref functions.

Loom

Here’s a video of me walking through this code in a sample dbt project :point_down:

Happy swap-ing!

5 Likes

This is amazing. Too bad BigQuery won’t let you rename a dataset :frowning:

2 Likes

100% agree. Fortunately @claus found a pretty reasonably workaround — check it out here!