The exact privileges we grant outside of dbt

In our recent guide on database administration, I talked conceptually about how we organize our data warehouses. For those that haven’t read it (you should!), here’s a recap of the parts of that article that are relevant to this guide.

In the data warehouses we maintain, we:

  • Ensure grants are consistent across a schema
  • Use shared roles (groups on Redshift, roles on Snowflake) named loader, transformer, reporter
  • Grant privileges to the shared roles, that are then inherited by their members
  • Use a simplified privilege structure:
    • read-schema: Ability to select from a schema and all relations within it.
    • create-schema: Ability to create a schema in a database, and therefore create relations within it and have all privileges on those relations.
  • Apply these privileges to the shared roles as follows:
    • loader:
      • create-schema
    • transformer:
      • read-schema for all schemas containing raw data
      • create-schema
    • reporter:
      • read-schema for all schemas containing transformed data

While the article went really deep into the “why” of all these practices, it was pretty light on the very specific “how” of setting this up, i.e. the implementation steps.

In this guide, I’ll run through the exact privileges we grant outside of dbt to set up our warehouses in this way. I’m going to first talk through the Redshift specific statements we run to set up our warehouses – these are the statements we’d run if we were given a brand new cluster to play in, so if you’re cleaning up an existing cluster, thigns will look a little different!

0. Set up the database(s) (and warehouses)

On Redshift, you can’t run cross-database queries, so we’ll do everything in one single database.

-- Redshift
create database analytics

On Snowflake we get a little luckier, and instead can use databases to group together our raw and transformed data:

-- Snowflake
create database RAW;
create database ANALYTICS;

For snowflake we also have to set up the warehouses here:

create warehouse LOADING
    warehouse_size = xsmall
    auto_suspend = 3600
    auto_resume = false
    initially_suspended = true;

create warehouse TRANSFORMING
    warehouse_size = xsmall
    auto_suspend = 60
    auto_resume = true
    initially_suspended = true;

create warehouse REPORTING
    warehouse_size = xsmall
    auto_suspend = 60
    auto_resume = true
    initially_suspended = true;

1. Set up shared roles

In Redshift, shared roles == groups

-- Redshift
create group loader;
create group transformer;
create group reporter;

In Snowflake, shared roles == role. We also grant privileges to a warehouse at this stage too.

-- Snowflake
create role LOADER;
grant all on warehouse LOADING to role TRANSFORMER; 

create role TRANSFORMER;
grant all on warehouse TRANSFORMING to role TRANSFORMER;

create role REPORTER;
grant all on warehouse REPORTING to role REPORTER;

2. Create users, assigning them to their shared roles

Every human being and application gets a separate user, and is assigned to the correct shared role.

-- Redshift
create user stitch -- or fivetran
    password = '_generate_this_'
    in group loader;

create user claire -- or drew, jeremy, etc
    password = '_generate_this_'
    in group transformer, reporter; -- dbt users get two roles!

create user dbt_cloud
    password = '_generate_this_'
    in group transformer;

create user looker -- or mode or periscope
    password = '_generate_this_'
    in group reporter;
-- Snowflake
-- create the users
create user STITCH_USER -- or FIVETRAN_USER
    password = '_generate_this_'
    default_warehouse = LOADING
    default_role = LOADER; 

create user CLAIRE -- or DREW, JEREMY, etc.
    password = '_generate_this_'
    default_warehouse = TRANSFORMING
    default_role = TRANSFORMER;

create user DBT_CLOUD_USER
    password = '_generate_this_'
    default_warehouse = TRANSFORMING
    default_role = TRANSFORMER;

create user LOOKER_USER -- or MODE_USER, PERISCOPE_USER etc.
    password = '_generate_this_'
    default_warehouse = REPORTING
    default_role = REPORTER;

-- then grant these the share role to each user
grant role LOADER to user STITCH_USER; -- or FIVETRAN_USER
grant role TRANSFORMER to user DBT_CLOUD_USER;
grant role TRANSFORMER to user CLAIRE; -- or DREW, JEREMY
grant role REPORTER to user LOOKER_USER; -- or MODE_USER, PERISCOPE_USER


3. Grant create-schema privileges to loader and transformer

By granting a user the privilege to create a schema within a database, they will also then be able to create relations within that schema. Since they own the schema and relations, they also have all privileges on the schema and relations.

Both your loader group, and transformer will need to be able to create in your database.

On Redshift this looks like:

-- Redshift

-- create-schema to loader
grant create on database analytics to group loader;
grant select on all tables in schema information_schema to group transformer;
grant select on all tables in schema pg_catalog to group transformer;

-- create-schema to transformer
grant create on database analytics to group transformer;
grant select on all tables in schema information_schema to group transformer;
grant select on all tables in schema pg_catalog to group transformer;

^ I’ve also snuck in some statements to give privileges to information schemas here – these are required by both your ETL tools (i.e. members of your loader group) and by dbt users (i.e. members of your transformer group).

On Snowflake we make things a little easier – just grant all on the relevant database.

-- Snowflake
grant all on database RAW to role LOADER;
grant create on database ANALYTICS to role TRANSFORMER;

4. Wait…

If you’ve just set up your database from scratch, you’ll have to wait a while here for data to start flowing before you can grant privileges for members of your transformer group to read it.

5. Grant read-schema privileges on raw data to transformer

Once your raw data is in your warehouse, you’ll need to give privileges for your loader group to read it!

There are three parts to granting the privilege to read all relations within a schema:

  • Granting usage on a schema
  • Granting select privileges on all existing relations within a schema
  • Granting select privileges on all relations created in this schema in the future (default privileges on Postgres and Redshift, or future privileges in Snowflake), so that you don’t need to rerun grant statements for every new object.

On Redshift, this looks like:

-- Redshift
-- for each schema:
grant usage on schema raw_stripe to group transformer;
grant select on all tables in schema raw_stripe to group transformer;
-- note that the `for user` clause should be adjusted based on the user who will be creating
-- objects in the schema (which should also be the schema owner).
alter default privileges for user stitch in schema raw_stripe
  grant select on tables to group transformer;

And on Snowflake, we have to first grant usage on the correct database (we don’t need to in Redshift)

-- Snowflake
grant usage on database RAW to role TRANSFORMER;
grant usage on all schemas in database RAW to role TRANSFORMER;

-- for each schema in RAW:
grant select on all tables in schema RAW.STRIPE to role TRANSFORMER;
grant select on future tables in schema RAW.STRIPE to role TRANSFORMER;

6. Grant read-schema privileges on transformed data to reporter

The reporter role should be able to read anything that dbt creates. The privileges look very similar to the ones above, however we implement these via an on-run-end hook – check out this article about the exact hooks we run!

For Snowflake we have one extra step here, we have to grant usage on the analytics database:

-- Snowflake
grant usage on database ANALYTICS to REPORTER

7. Maintain!

When new users are added, make sure you add them to the right shared role! And as new schemas of raw data are added, ensure that privileges are granted to the transformer role to be able to read them.

1 Like