[Obsolete] The exact grant statements we use in a dbt project

:warning: With dbt v1.2, much of this article is obsolete. Check out our updated permissioning guidelines on the dbt Developer Blog for our modern recommendations. - Joel

In a well maintained warehouse, your BI tools will need to be granted the privilege to read the tables and views dbt creates.

In this article, I’m going to run through the exact statements we run to grant privileges, and how we implement them in dbt, starting with a basic example and making it more flexible (but also more complex!) as we go.

In this article, let’s imagine we have a data warehouse that:

  • Is a Redshift warehouse (I’m using Redshift to start with as I think it’s the most common data warehouse used within our community, not because I think it’s the best warehouse – I’ll include the equivalent SQL for Snowflake at the end of the article)
  • Has a group named reporter, where every human (claire, drew, another_analyst_on_your_team) and app (looker, periscope, mode) that runs reporting queries is a member of the reporter group.
  • Follows the convention that every table and view within a schema is created by the same process, has the same owner, and has the same privileges applied to it.
  • Your dbt user creates, and therefore owns, the schema it is creating objects within (sometimes this isn’t true in larger organizations with different information security rules).

Using on-run-end hooks

For every schema that dbt creates, you’ll need to grant the following privileges:

  • usage on the schema that your relations are in
  • select on the tables and views within those schemas – you can either grant these individually to each table or view, or use the on all syntax

Let’s start by assuming that you are creating everything in one schema, named analytics. The grant statements you’d need to run would be:

grant usage on schema analytics to group reporter;
grant select on all tables in schema analytics to group reporter;

You really only need to run that first statement once for each schema – however running it multiple times won’t hurt! (For anyone following along at home, this is an example of idempotence – running it once, or running it six times, will have the same effect). You’ll need to make sure the schema exists before you run it!

That second statement needs to be run at the end of a run; so that it gets applied to all the tables and views dbt has created (many years ago, I assumed that the “grant select on all” would apply to new tables in my warehouse too – evidently not, but we’ll get to that soon).

So, let’s implement these statements as an on-run-end hook:

# dbt_project.yml
...

on-run-end:
  - 'grant usage on schema analytics to group reporter'
  - 'grant select on all tables in schema analytics to group reporter'

Great! Kind of…

Assuming that you’re using a different schema when running dbt locally (i.e. your dev environment) compared to your prod runs, these statements won’t make any sense for your dev runs; in fact, they’ll probably fail (your local user won’t own the analytics schema, and they aren’t a superuser (right?!) so they can’t grant privileges on a schema they don’t own).

Let’s fix this by using the target variable to grab our target schema:

# dbt_project.yml
...

on-run-end:
  - 'grant usage on schema {{ target.schema }} to group reporter'
  - 'grant select on all tables in schema {{ target.schema }} to group reporter'

Better! Again, kind of…

The problem with using on-run-end hooks here is that when dbt runs, behind the scenes it drops and recreates the table or view – this means that your model loses any exsiting privileges you’ve granted on it. As a result, for the period of time between when a model runs, and the end of the run, no one will be able to query that model, instead they’ll get a “permission denied” error. This creates downtime in your BI tool.

Using post-hooks to grant straight away

One way around this is to grant privileges on every model as soon as it is created, using a post-hook, combined with the this variable. {{ This }} looks like:

-- dbt_project.yml
...
models:
  post-hook: 'grant select on {{ this }} to group reporter'

seeds:
  post-hook: 'grant select on {{ this }} to group reporter'

on-run-end:
  - 'grant usage on schema {{ target.schema }} to group reporter'

It’s worth nothing here that the first time dbt runs with a new target.schema, your BI tools won’t have access to the relations until the end of the run. This is because the reporter group doesn’t yet have usage privileges on the schema. On any subsequent runs your BI tool will get privileges straight away, and even though the usage statement is being run, it’s just reapplying a privilege that already exists. I feel very OK with this – it’s pretty rare that I want to query something from my BI tool the very first time I’m running dbt with a new target.

Leveraging default grants

Personally, I prefer to leverage default grants (or future grants in Snowflake – more on that below), this means that as soon as a relation is created in a schema, the warehouse applies the right privileges to it. If I were writing this DDL from scratch, I would run these statements right after I create a schema:

create schema analytics;
grant usage on schema analytics to group reporter;
alter default privileges in schema analytics
    grant select on tables to group reporter;

But I very rarely run DDL from scratch (dbt does it for me!). So instead in dbt I use an on-run-end hook, like so:

# dbt_project.yml
...

on-run-end:
  - 'grant usage on schema {{ target.schema }} to group reporter'
  - 'grant select on all tables in schema {{ target.schema }} to group reporter'
  - 'alter default privileges in schema {{ target.schema }} grant select on tables to group reporter'

Since we ran this at the end of the run, there will be table and views already in your schema by the time you apply the default privileges. As such, we still have to run the grant select on all tables statement, because default privileges only get applied to new objects being created going forward.

In reality, these statements only need to be run once for each schema (ever), but running them lots of times won’t hurt! And this way they are version controlled, and dbt will take care of running them whenever a new dbt user wants to use a different target schema.

Catering for multiple schemas

In dbt, you can use custom schemas to split the models and views dbt builds across multiple schemas. So far, we’ve only been granting privileges on the target schema, so your BI tool won’t be able to query any of the other things you’ve created. You might choose to leave things like this by design. We prefer to instead grant privileges to every schema dbt creates, by leveraging the schemas variable.

Now our on-run-end hooks look like:


on-run-end:
 - '{% for schema in schemas %}grant usage on schema {{ schema }} to group reporter;{% endfor %}'
 - '{% for schema in schemas %}grant select on all tables in schema {{ schema }} to group reporter;{% endfor %}'
 - '{% for schema in schemas %}alter default privileges in schema {{ schema }} grant select on tables to group reporter{% endfor %}'

^ This is perfectly good code, but it’s not particularly readable! Especially since those for loops are repeated. So let’s abstract it into a macro.

Packaging it up in a macro

We like to create a macro, grant_select_on_schemas that packages up this logic for us!

-- macros/grant_select_on_schemas.sql

{% macro grant_select_on_schemas(schemas, group) %}
  {% for schema in schemas %}
    grant usage on schema {{ schema }} to group {{ group }};
    grant select on all tables in schema {{ schema }} to group {{ group }};
    alter default privileges in schema {{ schema }}
        grant select on tables to group {{ group }};
  {% endfor %}
{% endmacro %}

Then in we can call this macro in our post-hook, passing it our schema variable and the name of our group:

# dbt_project.yml

on-run-end:
- "{{ grant_select_on_schemas(schemas, 'reporter') }}"

Implementing this on Snowflake

The exact same concepts apply on Snowflake, just with slightly different syntax. Notably,:

  • you grant privileges to roles in Snowflake (if you’re interested, I’ve written more about the differences in roles across warehouses here)
  • you have to grant privileges to tables and views separately
  • default grants are named future grants, with a slightly different syntax, and requiring special privileges (see below)

We use the same macro name, calling it the same way in our dbt_project.yml file, but the contents of the macro are different:

-- macros/grant_select_on_schemas.sql

{% macro grant_select_on_schemas(schemas, role) %}
  {% for schema in schemas %}
    grant usage on schema {{ schema }} to role {{ role }};
    grant select on all tables in schema {{ schema }} to role {{ role }};
    grant select on all views in schema {{ schema }} to role {{ role }};
    grant select on future tables in schema {{ schema }} to role {{ role }};
    grant select on future views in schema {{ schema }} to role {{ role }};
  {% endfor %}
{% endmacro %}

To be able to grant future privileges on Snowflake, your role needs to have the manage grants privilege (see here) – you can apply this from the Snowflake UI as the SECURITYADMIN.

use role securityadmin;
grant manage grants on account to role transformer;

It is worth noting that this privilege is a global privilege – now anyone using the transformer role can change grants on any object as though they are the owner of the object. Up to you if you’re comfortable with this! If not, you may want to use a combination of post-hooks and on-run-end hooks instead :slight_smile:

8 Likes

I am interested in the rationale for this preference - does that mean that the BI tool would then see all schemas created by dbt?

Yeah! We understand it’s not right for every use case, so we sometimes change that logic based on the organization’s needs!

Ok, so I think it’s going to be easiest to have a few example cases here. Let’s say:

  • You have two environments, dev and prod
  • Some of your models have a custom schema configured (intermediate, core, marketing)
  • You use the generate_schema_name_for_env macro to determine the name of your (docs here).

The question then becomes, “Which schemas should my BI user be able to see?”. Here’s a couple of possibilities:

  1. All dev and prod schemas
  2. All the prod schemas (no dev)
  3. Just the prod marketing and core schemas
  4. The dev and prod versions of marketing and core.

A table is gonna help here!

target target.schema schema config schema Opt 1 Opt 2 Opt 3 Opt 4
dev dbt_adam NULL dbt_adam :white_check_mark:
dev dbt_adam intermediate dbt_adam :white_check_mark:
dev dbt_adam core dbt_adam :white_check_mark: :white_check_mark:
dev dbt_adam marketing dbt_adam :white_check_mark: :white_check_mark:
prod analytics NULL analytics :white_check_mark: :white_check_mark:
prod analytics intermediate intermediate :white_check_mark: :white_check_mark:
prod analytics core core :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:
prod analytics marketing marketing :white_check_mark: :white_check_mark: :white_check_mark: :white_check_mark:

Like everything in writing code, there’s tradeoffs here::

  • Fewer schemas means and improved user interface (i.e. the database explorer) for your end users, by effectively hiding unnecessary schemas
  • More schemas enables you to develop dashboards in your BI tool at the same time as you’re developing your dbt models.
  • Fewer schemas requires custom logic

1. All the dev and prod schemas

Pros:

  • No need to write custom logic

Cons:

  • The number of schemas may become confusing to an end-user

How-to:
You can use the example code in the original writeup.

2. All the prod schemas (no dev)

Pros:

  • Dev relations are hidden away for no-one to see; reducing database clutter.

Cons:

  • You won’t be able to develop dashboards in your BI tool at the same time as your dbt models since you won’t have access to them

Notes:

  • Our dev relations are always grouped into a single schema per developer, which usually helps with the clutter, so this is less of an issue for us

How-to:

# dbt_project.yml

on-run-end:
- "{{ grant_select_on_schemas(schemas, 'reporter') }}"
-- macros/grant_select_on_schemas.sql

{% macro grant_select_on_schemas(schemas, role) %}
{% if target.name == 'prod' %}
  {% for schema in schemas %}
    grant usage on schema {{ schema }} to role {{ role }};
    grant select on all tables in schema {{ schema }} to role {{ role }};
    grant select on all views in schema {{ schema }} to role {{ role }};
    grant select on future tables in schema {{ schema }} to role {{ role }};
    grant select on future views in schema {{ schema }} to role {{ role }};
  {% endfor %}
{% else %}
select 1; -- hooks will error if they don't have valid SQL in them, this handles that!
{% endif %}
{% endmacro %}

3. Just the prod marketing and core schemas

Pros:

  • Even less clutter – all your intermediate data transformations are hidden away

Cons:

  • As before, you won’t be able develop in your BI time at the same time

How-to:

# dbt_project.yml

on-run-end:
- "{{ grant_select_on_schemas(['core', 'marketing'], 'reporter') }}"
-- macros/grant_select_on_schemas.sql

{% macro grant_select_on_schemas(schemas, role) %}
{% if target.name == 'prod' %}
  {% for schema in schemas %}
    grant usage on schema {{ schema }} to role {{ role }};
    grant select on all tables in schema {{ schema }} to role {{ role }};
    grant select on all views in schema {{ schema }} to role {{ role }};
    grant select on future tables in schema {{ schema }} to role {{ role }};
    grant select on future views in schema {{ schema }} to role {{ role }};
  {% endfor %}
{% else %}
select 1; -- hooks will error if they don't have valid SQL in them, this handles that!
{% endif %}
{% endmacro %}

4. The dev and prod versions of marketing and core.

Pros:

  • You can develop your dashboards at the same time as your models
  • AND you still hide some of the intermediate schemas

Cons:

  • This is pretty complex, and honestly, as I wrote it out, I was like “nope, this is a bad idea”

Notes
The reason that this is super complex is because the schemas you need to grant privileges on changes based on the environment you’re in! Remember that the schema name is determined by the generate_schema_name macro, so we can leverage that here.

Also, you’ll end up granting privileges to all the relations in your dev schema, which isn’t necessarily a bad thing, since that dev schema is named dbt_adam so it’s very clear that “this should not be queried”.

How-to:

# dbt_project.yml

on-run-end:
- "{{ grant_select_on_schemas(['core', 'marketing'], 'reporter') }}"
-- macros/grant_select_on_schemas.sql

{% macro grant_select_on_schemas(custom_schemas, role) %}
  {% for custom_schema in custom_schemas %}

    {# Use the generate_schema_name macro to figure out the schema name for the current env #}
    {% set schema = generate_schema_name(custom_schema) %}

    grant usage on schema {{ schema }} to role {{ role }};
    grant select on all tables in schema {{ schema }} to role {{ role }};
    grant select on all views in schema {{ schema }} to role {{ role }};
    grant select on future tables in schema {{ schema }} to role {{ role }};
    grant select on future views in schema {{ schema }} to role {{ role }};
  {% endfor %}
{% endmacro %}

3 Likes

I appreciate the example on adding logic to the macro to run different hooks based on prod or dev using {% if target.name == 'prod' %}. That’s perfect and what I was looking for. I can extend that to have different sets of grant statements for each target name. And I see your comment over at my post has the same code too.

I did get lost near the end. :sweat_smile: I am not sure how the very last example:

On the other hand, it can also be a good idea to grant privileges on your dev schema and relations to your BI tool

is different from the last example in your original post where the grants do run on all schema?

@adamstone – I rewrote my answer above, hopefully it makes more sense :slight_smile: