Casting column datatypes based on a seed

leogodin · December 17, 2020, 2:33pm

Old thread, I know, but I ran into this recently. We have a simple data dictionary that lists source column name, target column name and data type. At the same time, we are pivoting key/value pairs to a wide format.

To do that, I created a macro that queries the column metadata then uses cases statements to pivot, rename and set the data type (source data is read in as all varchars). It works well.

Here’s the macro (recordset identifies source data, record_pip identifies the target):

{% macro get_field_definitions(record_set, record_pipe) %}
    {% set query %}
       select
           field_key,
           field_type,
           target_name
       from {{ ref('stg_field_definitions') }}
       where record_set = '{{record_set}}'
       and record_pipe = '{{record_pipe}}'
    {% endset %} 

    {% set query_results = run_query(query) %}

    {% if execute %}
        {% set results = query_results.rows %}
        {{ return(results) }}
    {% endif %}
{% endmacro %}

In SQL:

{{
    config(
        materialized='view'
    )
}}

{#- Get the field definitions that exist  -#}
{% set field_definitions = get_field_definitions(
                                                             record_set = 'SomeRecordset',
                                                             record_pipe = 'SomeTargetEntity')
 %}


select
    identifier,
        {% for row in field_definitions %}
            cast( max(case when field_key = {{ row[0] }} then field_value end) as {{row[1]}} ) as "{{ row[2] }}",
        {% endfor %}
    source_file,
    sum(is_changed) as num_changed
from {{ ref('stg_some_long_key_value_table') }}
group by
    dentifier,
    source_file -- We have daily files coming in with changes

A few notes:

Tried dbt_utils.pivot, but it doesn’t support using a second column as the value (that I could find)
This is my first attempt at creating a macro, I’d love any feedback
I was surprised how easy this was. It really sold me on DBT as a great solution to dynamic SQL

anl2455 · April 11, 2024, 7:01pm

Interested if you could expand on this. I am trying to do the same, if I understand correctly:

I have columns record_id, field_names, and field_values and am trying to pivot and make the values of the field_names columns with the associcated field_values as the data populating the pivot.

Topic		Replies	Views
Straight Pivot with 2nd Column as Values for the 1st Column Values Help macros	2	885	April 12, 2024
Can you define column data types in dbt? Help	6	20108	September 7, 2022
Auto pivot JSON string using schema.yml in BigQuery Show and Tell jinja , pivot	0	5525	September 8, 2020
Using get_column_values with CTEs Help jinja , dbt-utils	2	21914	July 21, 2020
Is there a way to get the column data types from SQL in v1.4.4? Help	0	1348	April 14, 2023

Casting column datatypes based on a seed

Related topics