Performing a blue/green deploy of your dbt project on Snowflake

This post is heavily inspired by work done by Alan from tails.com — thanks for sharing your approach on a dbt office hours!

What’s a blue/green deployment?

Blue-green deployment is a technique that reduces downtime and risk by running two identical production environments. Let’s take a website as an example — when you need to deploy a new version of your website, the new version is created in a separate environment. After the build is successful, traffic is routed to the new build. If anything goes wrong, you just don’t switch over the traffic! :boom:

In analytics, we can take a similar approach:

  • Maintain two separate databases (or schemas!) — for our example we’re going to call them stage and prod.
  • Whenever you perform your production run of dbt project, run it in the stage database
  • Once all your tests have passed, switch out the stage database for the prod database.

Hey presto! No bad data creeping into prod ever :smiley:

Doing this on Snowflake

  1. Set up two new databases, and grant ownership of these databases to the role that runs your dbt project
create database stage;
grant ownership on database stage to role transformer;

create database prod;
grant ownership on database prod to role transformer;

  1. Update your dbt project to write to the stage database (this will mean updating either your profiles.yml or your connection in dbt Cloud).
  2. Add an operation to your project to swap the databases (more on this below).
# macros/swap_database.sql
{% macro swap_database() %}

    {% set sql='alter database prod swap with stage' %}
    {% do run_query(sql) %}
    {{ log("database swapped", info=True) }}

{% endmacro %}
  1. Add a macro to your project to override the ref function (more on this below)
-- Macro to override ref and to render identifiers without a database.

{% macro ref(model_name) %}

  {% do return(builtins.ref(model_name).include(database=false)) %}

{% endmacro %}

  1. Add the step to swap the databases to your deployment process — you might end up with steps like this:
$ dbt run
$ dbt test
$ dbt run-operation swap_database

That’s it! Now, if your run fails at any step, bad data won’t be promoted to prod!

What’s this swap syntax?

Snowflake has a handy tool to help us out here — the swap syntax. It essentially performs two renames at once — you can find more info out here (cmd + f for swap with).

So by executing:

alter database prod swap with stage;

We’re essentially executing these two things at the exact same time, with zero downtime:

alter database prod rename to stage;
alter database stage rename to prod;

Why do I need to override ref?

By default, dbt resolves any ref function to a fully qualified object reference. This means that this:

select * from {{ ref('dim_customers') }}

Gets resolved to:

select * from stage.core.dim_customers

In Snowflake (and other databases), views are saved as queries. If the query has a reference to stage.core.dim_customers, we could end up with errors like this after we’ve done the swap:

SQL compilation error: Schema 'STAGE.CORE' does not exist or not authorized.

Or worse, we could be selecting from an out of date view (i.e. the one that got “demoted” to the stage database).

To get around this, we need to tell dbt: “well ackshullllly, don’t put the database name in there”. So this:

select * from {{ ref('dim_customers') }}

Should get resolved to this:

select * from core.dim_customers

Since this is a relative reference, as long as core.dim_customers exists in this prod database (which it should!), the view will work as expected.

To change the behavior of dbt, we can add our own ref macro. Whenever you add your own macro with the same name as a dbt macro, dbt will honor your version (related: dbt ships with a project).

There’s a bit of magic to do with builtins here, but the important thing is this — use the code above to get dbt to not use the database name when resolving ref functions.

Loom

Here’s a video of me walking through this code in a sample dbt project :point_down:

Happy swap-ing!

6 Likes

This is amazing. Too bad BigQuery won’t let you rename a dataset :frowning:

2 Likes

100% agree. Fortunately @claus found a pretty reasonably workaround — check it out here!

Claire! We’ve implemented the blue/green method for our dbt runs but I’m noticing a peculiar situation…

When we run dbt into staging and then swap the schemas, the view definitions are still referencing the staging schema even after the swap macro. Is this to be expected?

I figure if I really need to, I can run soemthing along :
$ dbt run -m config.materialized:table --target stage
$ dbt run-operation swap_schema
$ dbt run -m config.materialized:view --target prod

Ok, so it sounds like you’re swapping between schemas, right? Assuming the command looks something like:

{% macro swap_schema() %}

    {% set sql='alter schema prod swap with stage' %}
    {% do run_query(sql) %}
    {{ log("schema swapped", info=True) }}

{% endmacro %}

If that’s the case, you might need to alter the ref function even further!

{% macro ref(model_name) %}
  {% do return(builtins.ref(model_name).include(database=false).include(schema=false) %}
{% endmacro %}

Then a model like this customers model:

{{ config(materialized='view') }}
select * from {{ ref('stg_customers') }}

Ends up compiling to this:

create view my_database.stage.customers as (
  select * from stg_customers
)

(Note the lack of schema prefix in the view definition)

But then you get into all kind of strange things:

  • How do you get Snowflake to use the right schema when there’s no schema reference in the view definition? Does Snowflake automatically search for stg_customers the same schema the customers view was created in? (I’m not sure!) Or do you have to tell anything (or anyone) that selects from these views to use schema prod before selecting from a view… (in which case, that’s not great)
  • This would probably prevent you from using custom schemas in the future

You could implement an approach like the one you suggested, but I think it will look more like the below, since the initial command should run everything (otherwise, tables might be built on top of views that haven’t been updated).

$ dbt run -m --target stage
$ dbt run-operation swap_schema
$ dbt run -m config.materialized:view --target prod

Overall, I guess I’d push back on swapping schemas here, and encourage you to think about swapping databases instead. It just feels much tidier!

1 Like