get_column_values not unique?

The problem

I want to use the values of a column to produce an SQL query, via UNIONs - I have a list of relevant values from dbt_utils.get_column_values() for this, but this function only returns unique values and therefore it won’t work for my use case.

The context of why I’m trying to do this

I’m creating an audit table for all models in a schema, and the values in each row refer to column names for the most part. These column names can vary between tables. Here are some examples:

table_name monitor_flag created_col modified_col ingested_col dbt_runtime_col expected_max_delay_minutes
lead true created_at modified_at loaded_at dbt_utc_runtime 1440
owner true created_at modified_at loaded_at dbt_utc_runtime 1440
customer true src_created_at src_modified_at loaded_at dbt_utc_runtime 1440

Ultimately this will query each table for the latest freshness stats on a daily basis, something like this:

select
      '{{ table }}' as table_name,
      current_date as report_date,
      count(*) as total_row_count,
      sum(case when date({{ created_col }}) = current_date then 1 else 0 end) as created_today,
      {% if modified_col != '' %}
        sum(case when date({{ modified_col }}) = current_date and {{ modified_col }} != {{ created_col }} then 1 else 0 end) as modified_only_today,
      {% else %}
        0 as modified_only_today,
      {% endif %}
      max({{ ingested_col }}) as max_ingested_at,
      max({{ created_col }}) as max_created_at,
      {% if modified_col != '' %}
        max({{ modified_col }}) as max_modified_at,
      {% else %}
        null as max_modified_at,
      {% endif %}
    from {{ table }}

However for now I would be happy to see all values in a column.

What I’ve already tried

with config as (
    select * from {{ ref('monitoring__audit_config') }}
)

{% set tables = dbt_utils.get_column_values(table=ref('monitoring__audit_config'), column='table_name')  %}
{% set created_cols = dbt_utils.get_column_values(table=ref('monitoring__audit_config'), column='created_col')  %}
{% set modified_cols = dbt_utils.get_column_values(table=ref('monitoring__audit_config'), column='modified_col')  %}

{% for item in modified_cols %}
    {% do(log(item, info=True))%}
{% endfor %}


select * from config

Currently, do(log…) outputs modified_at , src_modified_at. If I can get it to output modified_at , modified_at , src_modified_at then I’m confident the whole thing will work.

Can anyone help?

Hi Ricky,
Thanks for sharing your use case.
You are right dbt_utils.get_column_values() uses select disctinct,so it will always return unique values without preserving duplicates.
Therefore,if you have to retrieve non-distinct values from a certain column,you have to use a custom macro,you can use this example and adapt it:

{%macro get_all_column_values(model, column)%}
      {% set query %}
                    select {{ column }} from {{ model }}
       {% endset %}
       {% set results = run_query(query) %}
       {% if execute %}
               {% set values = results.columns[0].values() %}
       {% else %}
               {% set values = [] %}
       {% endif %}
        
       {{ return(values) }}
{% endmacro %}

Then call the macro as follow :
{% set modified_cols = get_all_column_values(ref('monitoring_audit_config'), 'modified_col') %}
this will give you a full list,including duplicates.
Let me know if you’d like help adapting it further.I will be happy to assist you!

1 Like

Many thanks @mehdiatty21 that works perfectly!

I’d started working on much the same lines (didn’t think someone would reply so quickly), but your solution is better than mine so much obliged.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.