Hi there! Unfortunately get_column_values
is not compatible with CTEs, as it relies on the object you pass it being a table or view in the database!
Here’s some workarounds, in order of preference
- If you’re using Snowflake, use the built-in pivot function instead
- Your first option works! You can keep this as a separate model! By the way, have you considered converting this to a seed instead? That way you can store these values in a CSV.
- You can write your own macro,
get_column_values_from_query
, which does the same thing asget_column_values
but doesn’t depend on being passed a table or view.
I’ve included some sample macro code, but this is a huge tradeoff — is it really worth introducing all this code when you could just have a separate model (or seed!) for the CTE? Anyway…
{% set data_query %}
SELECT 'S' as size, 'red' as color
UNION ALL SELECT 'S' ,'blue'
UNION ALL SELECT 'S', 'red'
UNION ALL SELECT 'M', 'red'
{% endset %}
with data as (
{{ data_query }}
)
SELECT
size,
{{ dbt_utils.pivot('color', get_column_values_from_query(data_query, 'color')) }}
FROM data
GROUP BY size
Note that I had to set
a value for data_query
— earlier you were using data
as an argument, but there wasn’t a variable named data
in the Jinja context, so it would have been evaluating to None
.
From here, you’ll need to write your own macro – here’s some starter code. It works, but it isn’t pretty
-- macros/get_column_values_from_query.sql
{% macro get_column_values_from_query(query, column) -%}
{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
{%- if not execute -%}
{{ return('') }}
{% endif %}
{% set column_values_sql %}
with cte as (
{{ query }}
)
select
{{ column }} as value
from cte
group by 1
order by count(*) desc
{% endset %}
{%- set results = run_query(column_values_sql) %}
{{ log(results, info=True) }}
{% set results_list = results.columns[0].values() %}
{{ log(results_list, info=True) }}
{{ return(results_list) }}
{%- endmacro %}