Casting column datatypes based on a seed

Old thread, I know, but I ran into this recently. We have a simple data dictionary that lists source column name, target column name and data type. At the same time, we are pivoting key/value pairs to a wide format.

To do that, I created a macro that queries the column metadata then uses cases statements to pivot, rename and set the data type (source data is read in as all varchars). It works well.

Here’s the macro (recordset identifies source data, record_pip identifies the target):

{% macro get_field_definitions(record_set, record_pipe) %}
    {% set query %}
       select
           field_key,
           field_type,
           target_name
       from {{ ref('stg_field_definitions') }}
       where record_set = '{{record_set}}'
       and record_pipe = '{{record_pipe}}'
    {% endset %} 

    {% set query_results = run_query(query) %}

    {% if execute %}
        {% set results = query_results.rows %}
        {{ return(results) }}
    {% endif %}
{% endmacro %}

In SQL:

{{
    config(
        materialized='view'
    )
}}

{#- Get the field definitions that exist  -#}
{% set field_definitions = get_field_definitions(
                                                             record_set = 'SomeRecordset',
                                                             record_pipe = 'SomeTargetEntity')
 %}


select
    identifier,
        {% for row in field_definitions %}
            cast( max(case when field_key = {{ row[0] }} then field_value end) as {{row[1]}} ) as "{{ row[2] }}",
        {% endfor %}
    source_file,
    sum(is_changed) as num_changed
from {{ ref('stg_some_long_key_value_table') }}
group by
    dentifier,
    source_file -- We have daily files coming in with changes

A few notes:

  • Tried dbt_utils.pivot, but it doesn’t support using a second column as the value (that I could find)
  • This is my first attempt at creating a macro, I’d love any feedback
  • I was surprised how easy this was. It really sold me on DBT as a great solution to dynamic SQL

Interested if you could expand on this. I am trying to do the same, if I understand correctly:

I have columns record_id, field_names, and field_values and am trying to pivot and make the values of the field_names columns with the associcated field_values as the data populating the pivot.