A partial workaround for dbt_utils.star not working with CTEs

Since dbt_utils.star doesn’t work on CTEs, I’ve made a wee macro which solves my most common need: pulling all but a couple of columns out of a CTE in the same model file.

To use it, you need to set the columns in your select statement to a variable. Then you can pass it into the CTE as normal, but also pass it into the star_cte macro, which will split on commas (except in brackets) and then take the last element after splitting on spaces to get the column’s name/alias.

A super-contrived example:

with organisations as (select * from {{ ref('organisations') }}),

transformed as (
  select 
  {%- set select_columns = "
    org_id,
    'a' b,
    dateadd(month, 1, getdate()) as some_date,
    region_name
  "-%}

  {{select_columns}}
  from organisations
),

final as (
  select {{ star_cte(select_columns, 'transformed', except=['org_id', 'b']) }}, count(*)
  from transformed
  group by 1, 2
)

select * 
from final

renders to:

with organisations as (select * from "reporting"."dev_jlabes"."organisations"),

transformed as (
  select
    org_id,
    'a' b,
    dateadd(month, 1, getdate()) as some_date,
    region_name
  
  from organisations
),

final as (
  select   
        transformed.some_date, 
        transformed.region_name, count(*)
  from transformed
  group by 1, 2
)

select * 
from final

The macro itself is pretty brief:

{%- macro star_cte(select_columns, cte_name = '', except=[]) -%}
    
    {%- set include_cols = [] -%}
    {%- set nsp = namespace(is_in_brackets = False) %} {# "Move the variable's scope to be global" #}

    {%- for col in select_columns.split(',') -%}
        {%- if ')' in col %} {# "Argments to functions shouldn't be treated as column names" #}
            {%- set nsp.is_in_brackets = False %}
        {%- elif '(' in col %}
            {%- set nsp.is_in_brackets = True %}
        {%- endif %}

        {%- set col_tidied = col.strip().split(' ') | last -%}

        {%- if col_tidied not in except and not nsp.is_in_brackets -%}
            {%- set _ = include_cols.append(col_tidied) -%}

        {%- endif -%}
    {%- endfor -%}

    {% for col in include_cols %}
        {% if cte_name | length > 0 %}
            {{- cte_name -}} .
        {%- endif -%}
        {{- col -}}
        {{- ", " if not loop.last }}
    {%- endfor -%}

{%- endmacro -%}

It doesn’t help if you’re using ephemeral models upstream, but that use case doesn’t apply to me. This just helps DRY up an otherwise-messy piece of our models.

4 Likes