Using get_column_values with CTEs

Using: dbt 0.17.0 with BigQuery

I’m new to dbt and looking to how to better incorporate macros from dbt_utils into my workflow. In particular, I’m interested in using pivot and get_column_values but I have been unsuccessful so far in incorporating it in my model code with CTEs defined in the same model. Here’s what I have done so far.

I have taken the example from github and created a new model like so:

– example_model.sql

with data as (
    SELECT 'S' as size, 'red' as color
    UNION ALL SELECT 'S' ,'blue'
    UNION ALL SELECT 'S', 'red'
    UNION ALL SELECT 'M', 'red'
),
SELECT * FROM data

Then in another file I have the following:

SELECT
    size,
    {{ dbt_utils.get.pivot('color', dbt_utils.get_column_values(ref('example_model'), 'color')) }}
FROM ref('example_model')
GROUP BY size

So far so good and this works as expected.

However, my problem is that I don’t actually want to create a new model for the first table – it’s much too small and not used for anything else (as are a lot of other temporary/CTE tables that I create). Thus, what I really want to be able to do is something like this (all in the same model file):

with data as (
    SELECT 'S' as size, 'red' as color
    UNION ALL SELECT 'S' ,'blue'
    UNION ALL SELECT 'S', 'red'
    UNION ALL SELECT 'M', 'red'
),
SELECT
    size,
    {{ dbt_utils.get.pivot('color', dbt_utils.get_column_values(data, 'color')) }}
FROM data
GROUP BY size

However, I can’t seem to get this work. If I use data (no quotes), I get a 'data' is undefined error; if I use 'data' I get an 'str object' has no attribute 'database' error.

Thus my question is: what is the right way to refer to data such that I can use it in dbt_utils.get_column_values and without creating a new model in the process?

Thanks for your help

1 Like

Hi there! Unfortunately get_column_values is not compatible with CTEs, as it relies on the object you pass it being a table or view in the database!

Here’s some workarounds, in order of preference

  1. If you’re using Snowflake, use the built-in pivot function instead
  2. Your first option works! You can keep this as a separate model! By the way, have you considered converting this to a seed instead? That way you can store these values in a CSV.
  3. You can write your own macro, get_column_values_from_query, which does the same thing as get_column_values but doesn’t depend on being passed a table or view.

I’ve included some sample macro code, but this is a huge tradeoff — is it really worth introducing all this code when you could just have a separate model (or seed!) for the CTE? Anyway…

{% set data_query %}
SELECT 'S' as size, 'red' as color
UNION ALL SELECT 'S' ,'blue'
UNION ALL SELECT 'S', 'red'
UNION ALL SELECT 'M', 'red'
{% endset %}

with data as (
    {{ data_query }}
)

SELECT
    size,
    {{ dbt_utils.pivot('color', get_column_values_from_query(data_query, 'color')) }}
FROM data
GROUP BY size

Note that I had to set a value for data_query — earlier you were using data as an argument, but there wasn’t a variable named data in the Jinja context, so it would have been evaluating to None.

From here, you’ll need to write your own macro – here’s some starter code. It works, but it isn’t pretty

-- macros/get_column_values_from_query.sql
{% macro get_column_values_from_query(query, column) -%}

{#-- Prevent querying of db in parsing mode. This works because this macro does not create any new refs. #}
    {%- if not execute -%}
        {{ return('') }}
    {% endif %}

    {% set column_values_sql %}
    with cte as (
        {{ query }}
    )
    select
        {{ column }} as value

    from cte
    group by 1
    order by count(*) desc

    {% endset %}

    {%- set results = run_query(column_values_sql) %}
    {{ log(results, info=True) }}
    {% set results_list = results.columns[0].values() %}

    {{ log(results_list, info=True) }}
    {{ return(results_list) }}

{%- endmacro %}

2 Likes

Hi Claire,

Thats what I thought, but thank you for the confirmation!

You raise a good point about the tradeoffs – I’ll definitely go back and take a look and see if some of them should just be a separate model and seed. Having said that, I do appreciate you taking the time to write some starter code! In another attempt I also tried to use run_query, but I can now see how I used it incorrectly.

One question about the code. Could you explain why you included this block?

{%- if not execute -%}
        {{ return('') }}
{% endif %}

Is this related to the comment you included above about “Prevent querying of db in parsing mode”? If so, what exactly does that mean?

Thanks again!