In our recent guide on database administration, I talked conceptually about how we organize our data warehouses. For those that haven’t read it (you should!), here’s a recap of the parts of that article that are relevant to this guide.
In the data warehouses we maintain, we:
- Ensure grants are consistent across a schema
- Use shared roles (groups on Redshift, roles on Snowflake) named
- Grant privileges to the shared roles, that are then inherited by their members
- Use a simplified privilege structure:
read-schema: Ability to select from a schema and all relations within it.
create-schema: Ability to create a schema in a database, and therefore create relations within it and have all privileges on those relations.
- Apply these privileges to the shared roles as follows:
read-schemafor all schemas containing raw data
read-schemafor all schemas containing transformed data
While the article went really deep into the “why” of all these practices, it was pretty light on the very specific “how” of setting this up, i.e. the implementation steps.
In this guide, I’ll run through the exact privileges we grant outside of dbt to set up our warehouses in this way. I’m going to first talk through the Redshift specific statements we run to set up our warehouses – these are the statements we’d run if we were given a brand new cluster to play in, so if you’re cleaning up an existing cluster, thigns will look a little different!
0. Set up the database(s) (and warehouses)
On Redshift, you can’t run cross-database queries, so we’ll do everything in one single database.
-- Redshift create database analytics
On Snowflake we get a little luckier, and instead can use databases to group together our raw and transformed data:
-- Snowflake create database RAW; create database ANALYTICS;
For snowflake we also have to set up the warehouses here:
create warehouse LOADING warehouse_size = xsmall auto_suspend = 3600 auto_resume = false initially_suspended = true; create warehouse TRANSFORMING warehouse_size = xsmall auto_suspend = 60 auto_resume = true initially_suspended = true; create warehouse REPORTING warehouse_size = xsmall auto_suspend = 60 auto_resume = true initially_suspended = true;
1. Set up shared roles
shared roles == groups
-- Redshift create group loader; create group transformer; create group reporter;
shared roles == role. We also grant privileges to a warehouse at this stage too.
-- Snowflake create role LOADER; grant all on warehouse LOADING to role TRANSFORMER; create role TRANSFORMER; grant all on warehouse TRANSFORMING to role TRANSFORMER; create role REPORTER; grant all on warehouse REPORTING to role REPORTER;
2. Create users, assigning them to their shared roles
Every human being and application gets a separate user, and is assigned to the correct shared role.
-- Redshift create user stitch -- or fivetran password = '_generate_this_' in group loader; create user claire -- or drew, jeremy, etc password = '_generate_this_' in group transformer, reporter; -- dbt users get two roles! create user dbt_cloud password = '_generate_this_' in group transformer; create user looker -- or mode or periscope password = '_generate_this_' in group reporter;
-- Snowflake -- create the users create user STITCH_USER -- or FIVETRAN_USER password = '_generate_this_' default_warehouse = LOADING default_role = LOADER; create user CLAIRE -- or DREW, JEREMY, etc. password = '_generate_this_' default_warehouse = TRANSFORMING default_role = TRANSFORMER; create user DBT_CLOUD_USER password = '_generate_this_' default_warehouse = TRANSFORMING default_role = TRANSFORMER; create user LOOKER_USER -- or MODE_USER, PERISCOPE_USER etc. password = '_generate_this_' default_warehouse = REPORTING default_role = REPORTER; -- then grant these the share role to each user grant role LOADER to user STITCH_USER; -- or FIVETRAN_USER grant role TRANSFORMER to user DBT_CLOUD_USER; grant role TRANSFORMER to user CLAIRE; -- or DREW, JEREMY grant role REPORTER to user LOOKER_USER; -- or MODE_USER, PERISCOPE_USER
create-schema privileges to
By granting a user the privilege to create a schema within a database, they will also then be able to create relations within that schema. Since they own the schema and relations, they also have all privileges on the schema and relations.
loader group, and
transformer will need to be able to create in your database.
On Redshift this looks like:
-- Redshift -- create-schema to loader grant create on database analytics to group loader; grant select on all tables in schema information_schema to group transformer; grant select on all tables in schema pg_catalog to group transformer; -- create-schema to transformer grant create on database analytics to group transformer; grant select on all tables in schema information_schema to group transformer; grant select on all tables in schema pg_catalog to group transformer;
^ I’ve also snuck in some statements to give privileges to information schemas here – these are required by both your ETL tools (i.e. members of your
loader group) and by dbt users (i.e. members of your
On Snowflake we make things a little easier – just grant
all on the relevant database.
-- Snowflake grant all on database RAW to role LOADER; grant create on database ANALYTICS to role TRANSFORMER;
If you’ve just set up your database from scratch, you’ll have to wait a while here for data to start flowing before you can grant privileges for members of your
transformer group to read it.
read-schema privileges on raw data to
Once your raw data is in your warehouse, you’ll need to give privileges for your
loader group to read it!
There are three parts to granting the privilege to read all relations within a schema:
- Granting usage on a schema
- Granting select privileges on all existing relations within a schema
- Granting select privileges on all relations created in this schema in the future (default privileges on Postgres and Redshift, or future privileges in Snowflake), so that you don’t need to rerun grant statements for every new object.
On Redshift, this looks like:
-- Redshift -- for each schema: grant usage on schema raw_stripe to group transformer; grant select on all tables in schema raw_stripe to group transformer; -- note that the `for user` clause should be adjusted based on the user who will be creating -- objects in the schema (which should also be the schema owner). alter default privileges for user stitch in schema raw_stripe grant select on tables to group transformer;
And on Snowflake, we have to first grant usage on the correct database (we don’t need to in Redshift)
-- Snowflake grant usage on database RAW to role TRANSFORMER; grant usage on all schemas in database RAW to role TRANSFORMER; -- for each schema in RAW: grant select on all tables in schema RAW.STRIPE to role TRANSFORMER; grant select on future tables in schema RAW.STRIPE to role TRANSFORMER;
read-schema privileges on transformed data to
reporter role should be able to read anything that dbt creates. The privileges look very similar to the ones above, however we implement these via an on-run-end hook – check out this article about the exact hooks we run!
For Snowflake we have one extra step here, we have to grant usage on the
-- Snowflake grant usage on database ANALYTICS to REPORTER
When new users are added, make sure you add them to the right shared role! And as new schemas of raw data are added, ensure that privileges are granted to the
transformer role to be able to read them.