Permissions on Sensitive Data

I am in a situation where some of the data coming into the warehouse is considered sensitive and not for consumption by all members of the organization (i.e. part costs, unit price, general ledger). One solution would be to put the necessary tables in schemas based on who needs access and grant permissions on the schema level. But that breaks up grouping tables by topic and feels like the ‘thin end of the wedge’ on a schema proliferation problem. An other option is manage the permissions on the table level but I have not found a clean way to automate this from within dbt. I imagen it will be something like getting a custom property in to a macro that I can use to deterring what users have access but I am not sure.

So before I start a redesign, is one of these methods (schema vs. Table) better than the other?

Hi @pempey,
I think you could save yourself some future complications and grant some future flexibility by giving access to roles (with users assigned to roles) through views.

You can have one base (“restricted”) view where the problematic fields are always null and are accessible by a role all users have. This will be the default view of those data.
Then where necessary you create another “unrestricted” view with all the fields, but accessible only by a role assigned only to the people with the right to see the full data.

In this way all users have the base views with all the fields, reports can be built with the common interface and normal users never get the problematic data, but a user with the advanced role could just swap the view and see all the data. You could even leverage some super-power from your BI tool to do that swap transparently depending on the user.

We used a similar approach to remove personal identifiable information (PIIs).

The quickest idea of how to do that in DBT is to have the two set of views created in two / N “parallel” folders with a post-hook per folder to assign the correct role.

The idea of N folders instead of just two is because you might be in a situation where you have N different roles to control access, with one being the generic user and the other granting access to a specific subset of reserved data like purchases, finance, hr and so on…

Hi @Peter Empey, Did you already read this paper from gitlab on the topic ? sensitive data

hoping it helps,

@fabrice.etanchaud Thanks for the link it was a good read and it did provide some insight on possible paths for implementation.

@RobMcZag In the current project I am working on we are using roles, attached to Active Directory, to determine access to the data. Admittedly, I am not keen on the idea of parallels models but the post-hook is similar to what I have been thinking.

Both of the approaches mentioned are a per-column approach. I was thinking on the schema or table, is column going to be better?

Hi @pempey, if you can go with per schema or per table without sacrificing ability to work “normally” both in development and in use of the data platform, then I suppose it is going to be easier to maintain as you have less nodes to assign roles and they are well separated.

The approach that I suggested can help you achieve both the interface compatibility (if it makes sense and it is of any value for your case) and the schema level role assignment, if you put the full views in a “full” schema and the reserved views in a “reserved” schema that is the default for all users.

As for the duplication you can use the dbt-utils star macro that has an exclude field to build a macro that builds the restricted view by removing the restricted fields and returning null on them.

If you want the model name and list of excluded fields can be in a variable coming from a YAML… so you have a restricted view that can contain always the same call and all the config is in a YAML file.
And If you go for different schema the full and restricted view can have the same name.

You could even have the YAML at least partially generated by SQL looking at the existing tables/views in the full schema (and maybe excluding the similarly named views in the restricted schema).

Example of the star macro:
select {{'my_model'), except=["exclude_field_1", "exclude_field_2"]) }} from {{ref('my_model')}}