Using get_column_values with CTEs

Hi there! Unfortunately get_column_values is not compatible with CTEs, as it relies on the object you pass it being a table or view in the database!

Here’s some workarounds, in order of preference

  1. If you’re using Snowflake, use the built-in pivot function instead
  2. Your first option works! You can keep this as a separate model! By the way, have you considered converting this to a seed instead? That way you can store these values in a CSV.
  3. You can write your own macro, get_column_values_from_query, which does the same thing as get_column_values but doesn’t depend on being passed a table or view.

I’ve included some sample macro code, but this is a huge tradeoff — is it really worth introducing all this code when you could just have a separate model (or seed!) for the CTE? Anyway…

{% set data_query %}
SELECT 'S' as size, 'red' as color
UNION ALL SELECT 'S' ,'blue'
UNION ALL SELECT 'S', 'red'
UNION ALL SELECT 'M', 'red'
{% endset %}

with data as (
    {{ data_query }}
)

SELECT
    size,
    {{ dbt_utils.pivot('color', get_column_values_from_query(data_query, 'color')) }}
FROM data
GROUP BY size

Note that I had to set a value for data_query — earlier you were using data as an argument, but there wasn’t a variable named data in the Jinja context, so it would have been evaluating to None.

From here, you’ll need to write your own macro – here’s some starter code. It works, but it isn’t pretty

-- macros/get_column_values_from_query.sql
{% macro get_column_values_from_query(query, column) -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
    {%- if not execute -%}
        {{ return('') }}
    {% endif %}

    {% set column_values_sql %}
    with cte as (
        {{ query }}
    )
    select
        {{ column }} as value

    from cte
    group by 1
    order by count(*) desc

    {% endset %}

    {%- set results = run_query(column_values_sql) %}
    {{ log(results, info=True) }}
    {% set results_list = results.columns[0].values() %}

    {{ log(results_list, info=True) }}
    {{ return(results_list) }}

{%- endmacro %}

2 Likes