Old thread, I know, but I ran into this recently. We have a simple data dictionary that lists source column name, target column name and data type. At the same time, we are pivoting key/value pairs to a wide format.
To do that, I created a macro that queries the column metadata then uses cases statements to pivot, rename and set the data type (source data is read in as all varchars). It works well.
Here’s the macro (recordset identifies source data, record_pip identifies the target):
{% macro get_field_definitions(record_set, record_pipe) %}
{% set query %}
select
field_key,
field_type,
target_name
from {{ ref('stg_field_definitions') }}
where record_set = '{{record_set}}'
and record_pipe = '{{record_pipe}}'
{% endset %}
{% set query_results = run_query(query) %}
{% if execute %}
{% set results = query_results.rows %}
{{ return(results) }}
{% endif %}
{% endmacro %}
In SQL:
{{
config(
materialized='view'
)
}}
{#- Get the field definitions that exist -#}
{% set field_definitions = get_field_definitions(
record_set = 'SomeRecordset',
record_pipe = 'SomeTargetEntity')
%}
select
identifier,
{% for row in field_definitions %}
cast( max(case when field_key = {{ row[0] }} then field_value end) as {{row[1]}} ) as "{{ row[2] }}",
{% endfor %}
source_file,
sum(is_changed) as num_changed
from {{ ref('stg_some_long_key_value_table') }}
group by
dentifier,
source_file -- We have daily files coming in with changes
A few notes:
- Tried dbt_utils.pivot, but it doesn’t support using a second column as the value (that I could find)
- This is my first attempt at creating a macro, I’d love any feedback
- I was surprised how easy this was. It really sold me on DBT as a great solution to dynamic SQL