Setting up Snowflake — the exact grant statements we run

:sparkles: 2021 update :sparkles:
I originally wrote this article in 2019, when I was an employee of dbt Labs. Now, I’m freelancing (among other things), and actually got the opportunity to set up some new databases recently. This advice no longer represents the official dbt Labs stance as I’ve mixed in some of my own thoughts.
Further, the original article included instructions for both Redshift and Snowflake. This article now just has instructions for Snowflake, while the Redshift instructions have been moved to a separate article

Before reading through these exact instructions, take a moment to read some prior art:

  1. @jerco’s article on setting up the concepts behind sending up Snowflake for clients
  2. My article on database administration

While these article go deep into the “what” and “why” of Snowflake admin respectively, it was pretty light on the very specific “how” of setting this up, i.e. the implementation steps.

In this guide, I’ll run through the exact privileges it takes to set up your Snowflake account the way these articles intended.

1. Set up databases

use role sysadmin;
create database raw;
create database analytics;

2. Set up warehouses

create warehouse loading
    warehouse_size = xsmall
    auto_suspend = 3600
    auto_resume = false
    initially_suspended = true;

create warehouse transforming
    warehouse_size = xsmall
    auto_suspend = 60
    auto_resume = true
    initially_suspended = true;

create warehouse reporting
    warehouse_size = xsmall
    auto_suspend = 60
    auto_resume = true
    initially_suspended = true;

3. Set up roles and warehouse permissions

use role securityadmin;

create role loader;
grant all on warehouse loading to role loader; 

create role transformer;
grant all on warehouse transforming to role transformer;

create role reporter;
grant all on warehouse reporting to role reporter;

4. Create users, assigning them to their roles

Every human being and application gets a separate user, and is assigned to the correct role.

create user stitch_user -- or fivetran_user
    password = '_generate_this_'
    default_warehouse = loading
    default_role = loader; 

create user claire -- or amy, jeremy, etc.
    password = '_generate_this_'
    default_warehouse = transforming
    default_role = transformer
    must_change_password = true;

create user dbt_cloud_user
    password = '_generate_this_'
    default_warehouse = transforming
    default_role = transformer;

create user looker_user -- or mode_user etc.
    password = '_generate_this_'
    default_warehouse = reporting
    default_role = reporter;

-- then grant these roles to each user
grant role loader to user stitch_user; -- or fivetran_user
grant role transformer to user dbt_cloud_user;
grant role transformer to user claire; -- or amy, jeremy
grant role reporter to user looker_user; -- or mode_user, periscope_user

5. Let loader load data

Give the role unilateral permission to operate on the raw database

use role sysadmin;
grant all on database raw to role loader;

6. Let transformer transform data

The transformer role needs to be able to read raw data.

If you do this before you have any data loaded, you can run:

grant usage on database raw to role transformer;
grant usage on future schemas in database raw to role transformer;
grant select on future tables in database raw to role transformer;
grant select on future views in database raw to role transformer;

If you already have data loaded in the raw database, make sure also you run the following to update the permissions

grant usage on all schemas in database raw to role transformer;
grant select on all tables in database raw to role transformer;
grant select on all views in database raw to role transformer;

transformer also needs to be able to create in the analytics database:

grant all on database analytics to role transformer;

7. Let reporter read the transformed data

A previous version of this article recommended this be implemented through hooks in dbt, but this way lets you get away with a one-off statement.

grant usage on database analytics to role reporter;
grant usage on future schemas in database analytics to role reporter;
grant select on future tables in database analytics to role reporter;
grant select on future views in database analytics to role reporter;

Again, if you already have data in your analytics database, make sure you run:

grant usage on all schemas in database analytics to role reporter;
grant select on all tables in database analytics to role transformer;
grant select on all views in database analytics to role transformer;

8. Maintain!

When new users are added, make sure you add them to the right role! Everything else should be inherited automatically thanks to those future grants.

Extensions

The advice in this article has stayed pretty unchanged for the last few years, except for small syntax changes when Snowflake let you grant database-wide permissions (previously this had to be done schema by schema).

As I used it at scale though, I realized I wanted to extend the pattern a little more:

  • I’ve previously written about blue-green deploys which uses a database-level swap command. This architecture doesn’t support that
  • I’ve also used snapshots in a project, where you really want those to not be dropped, but since they end up in the same database as transformed data, there are no measures preventing an errant drop command
  • If both your dbt_cloud_user and you (both transformers) have the same permissions, you might accidentally drop a table that you shouldn’t, like a huge incremental model that you wish you didn’t have to rebuild.

Here’s some ideas to extend this architecture:

  • Replace the standard analytics database with analytics_dev, analytics_stage and analytics_prod. Use roles transformer_dev and transformer_prod (the former can only create in analytics_dev). This helps prevent accidentally droping a prod transform, and also lets you do blue-green deploys
  • Also add a new schema artifacts for dbt snapshots, and other metadata produced by dbt runs (e.g. for measuring project performance. These aren’t exactly raw data, nor are they transformed data. Make it such that only transfomer_prod can add data in this database (and it might be worth looking into ways to have even that role not be able to drop it!).
5 Likes

Hi, the statement grant create on database ANALYTICS to role TRANSFORMER; fails for me with the following error:

SQL compilation error: error line 1 at position 6 Invalid privilege CREATE.

Should it read grant create schema on database ANALYTICS...?

1 Like

Thank you! Just updated!

Hi, I don’t have the ability to send a direct message yet, but I noticed the first grant should be to
...role LOADER rather than ...role TRANSFORMER.

Thanks so much for posting the use case article on snowflake and this walkthrough. It really helped me a lot.

1 Like

Hi Claire,
is there anything about this architecture that needs mentioning specifically for testing updates to models?
For example, would I have a test version of the ANALYTICS database with a test Warehouse running only during testing? or is there something I’m missing. Testing doesn’t get a mention in the use case either so I’m guessing it’s done in dbt some other way.

Hey there @dazlari. There’s no extra steps required for running the dbt test command. Have you added any tests to your project yet / run them? The workflow might make a little more sense if you do that :slight_smile:

Not that far yet, but I have prepared our environment in the last few days based on this approach. Up until now we’ve been running the lot (data+views) out of one DB and one Warehouse but this structure is very appealing. I’m still getting some initial ‘off the shelf’ dbt modules together and my head around prepping our own, but the thought of deploying some test views crept up and I couldn’t see where they fit in. I’ll keep reading - thanks!

Hi Clair, I’ve done more research on this and found the following in the docs which I think is still relevant to this topic:

We recommend using different schemas within one data warehouse to separate your environments.

If you have multiple dbt users writing code, it often makes sense for each user to have their own development environment. A pattern we’ve found useful is to set your dev target schema to be dbt_<username> .

This is more like what I was expecting to find, but I didn’t use the term ‘environment’ in my original question. Not sure if you wanted to bridge this concept in this article.

I thought I should add the following in answering my own questions. As I’m relatively new here and have struggled to find a few of the the signposts (its a mix between docs and discourse!). I started with docs, found the above, and eventually moved on to the following which I would now consider required reading:

How we structure our dbt projects

This next post ( based on another users question ), gives further insight into how to operate this environment.

What are the dbt commands you run in your production deployment of dbt?

I’m sure there’s a lot more out there, but you’ve written a lot of great stuff Claire!!

Thanks for sharing, @claire! This is incredibly helpful.

Looks like there is one typo in step 6, line 6 of the Snowflake code block. Should it be “tables” instead of “table”?

Thanks, all fixed! :smiley:

1 Like

Hmm, did your transformer role create the analytics database, or did a different role?

Thank you @claire ! As a novice to warehouse setup, I found this incredibly helpful (same goes for the other related/linked articles you and @jerco have written). Even on my one-person team, I now have users for both superuser leo and regular leo!

I have a question about steps 6 and 7 (I’ll only go through step 6).

In step 6, you write:

Because I completed the fundamentals course, my raw database already exists and contains data.

So, when I read:

If you do this before you have any data loaded, you can run:

I assumed that I would not run the subsequent four lines of code, because I have data loaded.

But then, when I read the next part:

If you already have data loaded in the raw database, make sure also you run the following to update the permissions

I thought … huh? What do you mean, “make sure you also run”? Was I supposed to have run some of the previous lines?

I had assumed that

grant usage on all schemas in database raw to role transformer;

would be the first line of code I’d run in step 6, because I skipped the previous bit (as instructed, since I already have data loaded).

Exactly which lines of code should I run if I already have data loaded in my raw database?

What I’ve done is the following, a mishmash of the instructions in step 6:

grant usage on database raw to role transformer;
grant usage on all schemas in database raw to role transformer;
grant select on all tables in database raw to role transformer;
grant select on all views in database raw to role transformer;

When I try to run any of the future statements in step 6 (or in step 7), I receive this error:

SQL access control error: Insufficient privileges to operate on database 'RAW'

I understand that the future statements are important:

Everything else should be inherited automatically thanks to those future grants.

And I’m assuming I should run them even if I already have data loaded into my raw database … right?

Apologies for how verbose this is, but I hope it’s clear what I’m confused about! Please redirect me to Slack or somewhere else if there’s a better place for such questions.

Thank you again for the excellent post.

Hi! This auto_resume line in the create warehouse loading command caused issues for me when trying to set up a Fivetran connection. The Fivetran error message specifically indicated that auto_resume had to be set to true:

"Default Warehouse Test: You must specify AUTO_RESUME = TRUE for LOADING"

Once I updated the auto_resume property (alter warehouse loading set auto_resume=true;), the connection was successful!

I’m curious if auto_resume has to be false when using Fivetran, or if it has to do with my specific setup.

Thanks!

Leo

1 Like

Has anyone implemented the model where you create object and functional roles and grant roles to other roles? Can you share pros/cons?

I know it’s been awhile and you’ve probably found a workaround but just wanted to add my thoughts since I ran into this as well.

The “grant *** on all” statements give you privileges to existing objects so if you have them, you need to run them.

The “grant *** on future” statements give you privileges on all future objects so that also needs to be run. The order shouldn’t matter.

What does matter though, is that when running the future grants, you need to use the role securityadmin. This is because only securityadmin and accountadmin have that privilege.

Read more here:

2 Likes

Hey @claire! I just wanted to double-check the grants in section 7 for an existing analytics database. Should the last 2 lines grant to the role ‘reporter’ rather than ‘transformer’, in line with the rest of the section?

grant usage on all schemas in database analytics to role reporter;
grant select on all tables in database analytics to role reporter;
grant select on all views in database analytics to role reporter;

Thanks!

1 Like

Thanks Eugene!! I don’t have time to dig into this right now, but I’ll return to your comment next time I’m setting up Snowflake. Much appreciated.