2021 update
This article originally appeared as part of an article that covered grant statements for both Redshift and Snowflake (Snowflake version here). The original was written in 2019, when I was an employee of dbt Labs. Now, I’m freelancing (among other things), and actually got the opportunity to set up some new Snowflake databases recently. I took the opportunity to update some of my advice over on the Snowflake version of this article, and figured it was more logical to have two separate articles. Note that I haven’t used Redshift lately, but I think this advice still stands (until they implement cross-database queries for everyone, not just those usingra3
nodes).
Before reading through these exact instructions, take a moment to read some prior art:
- @jerco’s article on setting up the concepts behind sending up Snowflake for clients — we’re trying to apply a lot of the same ideas on Redshift, but with less flexibility (namely, no databases or warehouses)
- My article on database administration
While these article go deep into the “what” and “why” of database admin respectively, it was pretty light on the very specific “how” of setting this up, i.e. the implementation steps.
In this guide, I’ll run through the exact privileges it takes to set up your Redshift cluster the way these articles intended.
1. Set up the database
On Redshift, you can’t run cross-database queries, so we’ll do everything in one single database. By default, Redshift creates a database named dev
, so let’s create our own one with a better name
create database analytics
2. Set up groups
We’ll grant privileges to these groups rather than to users directly
create group loader;
create group transformer;
create group reporter;
3. Create users, assigning them to their groups
Every human being and application gets a separate user, and is assigned to the correct shared role.
create user stitch -- or fivetran
password '_generate_this_'
in group loader;
create user claire -- or amy, jeremy, etc
password '_generate_this_'
in group transformer;
create user dbt_cloud
password '_generate_this_'
in group transformer;
create user looker -- or mode or periscope
password '_generate_this_'
in group reporter;
4. Let loader
load data
By granting a user the privilege to create a schema within a database, they will also then be able to create tables and views within that schema.
grant create on database analytics to group loader;
grant select on all tables in schema information_schema to group loader;
grant select on all tables in schema pg_catalog to group loader;
grant select on all tables in schema information_schema to group transformer;
grant select on all tables in schema pg_catalog to group transformer;
^ I’ve also snuck in some statements to give privileges to information schemas here – these are required by both your ETL tools (i.e. members of your loader
group) and by dbt users (i.e. members of your transformer
group).
5. Wait…
If you’ve just set up your database from scratch, you’ll have to wait a while here for data to start flowing before you can grant privileges for members of your transformer
group to read it.
6. Let transformer
transform data
Once your raw data is in your warehouse, you’ll need to give privileges for your transformer
group to read it!
There are three parts to granting the privilege to read all relations within a schema:
- Granting usage on a schema
- Granting select privileges on all existing relations within a schema
- Granting select privileges on all relations created in this schema in the future (default privileges), so that you don’t need to rerun grant statements for every new object in the schema.
On Redshift, this looks like:
-- for each schema:
grant usage on schema raw_stripe to group transformer;
grant select on all tables in schema raw_stripe to group transformer;
-- note that the `for user` clause should be adjusted based on the user who will be creating
-- objects in the schema (which should also be the schema owner).
alter default privileges for user stitch in schema raw_stripe
grant select on tables to group transformer;
You’ll need to rerun these whenever you add a new schema to your database via Fivetran or Stitch
This group also needs to be able to create in the database:
grant create on database analytics to group transformer;
And needs some permissions to generate dbt docs
grant select on all tables in schema information_schema to group transformer;
grant select on all tables in schema pg_catalog to group transformer;
7. Let reporter
read the transformed data
The reporter
role should be able to read anything that dbt creates.
In Redshift, we implement these via an on-run-end hook – check out this article about the exact hooks we run!
8. Maintain!
When new users are added, make sure you add them to the right shared role! And as new schemas of raw data are added, ensure that privileges are granted to the transformer
role to be able to read them.