I originally wrote this article in 2019, when I was an employee of dbt Labs. Now, I’m freelancing (among other things), and actually got the opportunity to set up some new databases recently. This advice no longer represents the official dbt Labs stance as I’ve mixed in some of my own thoughts.
Further, the original article included instructions for both Redshift and Snowflake. This article now just has instructions for Snowflake, while the Redshift instructions have been moved to a separate article
Before reading through these exact instructions, take a moment to read some prior art:
- @jerco’s article on setting up the concepts behind sending up Snowflake for clients
- My article on database administration
While these article go deep into the “what” and “why” of Snowflake admin respectively, it was pretty light on the very specific “how” of setting this up, i.e. the implementation steps.
In this guide, I’ll run through the exact privileges it takes to set up your Snowflake account the way these articles intended.
use role sysadmin; create database raw; create database analytics;
create warehouse loading warehouse_size = xsmall auto_suspend = 3600 auto_resume = false initially_suspended = true; create warehouse transforming warehouse_size = xsmall auto_suspend = 60 auto_resume = true initially_suspended = true; create warehouse reporting warehouse_size = xsmall auto_suspend = 60 auto_resume = true initially_suspended = true;
use role securityadmin; create role loader; grant all on warehouse loading to role loader; create role transformer; grant all on warehouse transforming to role transformer; create role reporter; grant all on warehouse reporting to role reporter;
Every human being and application gets a separate user, and is assigned to the correct role.
create user stitch_user -- or fivetran_user password = '_generate_this_' default_warehouse = loading default_role = loader; create user claire -- or amy, jeremy, etc. password = '_generate_this_' default_warehouse = transforming default_role = transformer must_change_password = true; create user dbt_cloud_user password = '_generate_this_' default_warehouse = transforming default_role = transformer; create user looker_user -- or mode_user etc. password = '_generate_this_' default_warehouse = reporting default_role = reporter; -- then grant these roles to each user grant role loader to user stitch_user; -- or fivetran_user grant role transformer to user dbt_cloud_user; grant role transformer to user claire; -- or amy, jeremy grant role reporter to user looker_user; -- or mode_user, periscope_user
Give the role unilateral permission to operate on the
use role sysadmin; grant all on database raw to role loader;
transformer role needs to be able to read raw data.
If you do this before you have any data loaded, you can run:
grant usage on database raw to role transformer; grant usage on future schemas in database raw to role transformer; grant select on future tables in database raw to role transformer; grant select on future views in database raw to role transformer;
If you already have data loaded in the
raw database, make sure also you run the following to update the permissions
grant usage on all schemas in database raw to role transformer; grant select on all tables in database raw to role transformer; grant select on all views in database raw to role transformer;
transformer also needs to be able to create in the
grant all on database analytics to role transformer;
A previous version of this article recommended this be implemented through hooks in dbt, but this way lets you get away with a one-off statement.
grant usage on database analytics to role reporter; grant usage on future schemas in database analytics to role reporter; grant select on future tables in database analytics to role reporter; grant select on future views in database analytics to role reporter;
Again, if you already have data in your
analytics database, make sure you run:
grant usage on all schemas in database analytics to role reporter; grant select on all tables in database analytics to role transformer; grant select on all views in database analytics to role transformer;
When new users are added, make sure you add them to the right role! Everything else should be inherited automatically thanks to those
The advice in this article has stayed pretty unchanged for the last few years, except for small syntax changes when Snowflake let you grant database-wide permissions (previously this had to be done schema by schema).
As I used it at scale though, I realized I wanted to extend the pattern a little more:
- I’ve previously written about blue-green deploys which uses a database-level swap command. This architecture doesn’t support that
- I’ve also used
snapshotsin a project, where you really want those to not be dropped, but since they end up in the same database as transformed data, there are no measures preventing an errant
- If both your
dbt_cloud_userand you (both
transformers) have the same permissions, you might accidentally drop a table that you shouldn’t, like a huge incremental model that you wish you didn’t have to rebuild.
Here’s some ideas to extend this architecture:
- Replace the standard
analytics_prod. Use roles
transformer_prod(the former can only create in
analytics_dev). This helps prevent accidentally
droping a prod transform, and also lets you do blue-green deploys
- Also add a new schema
artifactsfor dbt snapshots, and other metadata produced by dbt runs (e.g. for measuring project performance. These aren’t exactly
rawdata, nor are they
transformeddata. Make it such that only
transfomer_prodcan add data in this database (and it might be worth looking into ways to have even that role not be able to drop it!).