Problem Context:
We have various fields that we do/don’t want to surface to end users in Looker for any given transformation. However, the fields are necessary for transformation logic. In a gross sense, our query relationships goes staging
→ transformation
→ surface only selected fields to visualization tools ('Clean')
. However, manually maintaining our Clean
queries is growing into a bit of a pain, and we’ve run into some frustrating situations doing so. A key part of this is that we don’t want to actually maintain the SQL in the Clean
queries.
We also have limited resources to implement this from a BigQuery or Infra side of things, so we had to do this only with what DBT has to offer.
Solution
We have added a property to every column entry, that we can read via Jinja. This greatly simplifies things, as we already document each column, both as a matter of good practice, and to centralize documentation in DBT–all of our descriptions are written to BQ, and Looker now has the capability to import the descriptions into LookML when creating view files. Part of the design of this is to operate on an allow-list basis–the column must have the property, and the property must = true
. At some point, a column must be defined as being allowed/not allowed, and we decided that dbt is right place for us, given our constraints.
I had avoided using the tags
hierarchy, as the .yml files were getting pretty big.
An example of the model.yml
:
version: 2
models:
- name: upstream_query
description: "blablabla"
columns:
- name: field_1
description: "blablabla"
is_allowed: true
- name: field_2
description: "blablabla"
is_allowed: false
- name: field_3
description: "blablabla"
is_allowed: true
The macro:
{% macro get_allowed_columns_macro(tablename) -%}
{% if execute %}
{%- set column_list = [] -%}
{%- for node in graph.nodes.values() -%}
{%- if node.name == tablename -%}
{%- for column, properties in node.columns.items() -%}
{%- do column_list.append(column) if properties.get('is_allowed') == true -%}
{%- endfor -%}
{%- endif -%}
{%- endfor %}
{{ return(column_list) }}
{% endif %}
{%- endmacro %}
An example query:
/* Define the table you are referring to */
{% set tableref = 'upstream_query' -%}
/* Create a [list] of columns that are flagged as field_name == true */
{% set allowed_columns = get_allowed_columns_macro(tableref) -%}
SELECT
{%- for columns in allowed_columns -%}
{%- if not loop.first %}
, {{ columns }}
{%- else %}
{{ columns }}
{%- endif -%}
{%- endfor %}
FROM {{ ref(tableref) }}
Which will compile to:
SELECT
field_1
, field_3
FROM `database.dataset.upstream_query`