The problem
I want to use the values of a column to produce an SQL query, via UNIONs - I have a list of relevant values from dbt_utils.get_column_values()
for this, but this function only returns unique values and therefore it won’t work for my use case.
The context of why I’m trying to do this
I’m creating an audit table for all models in a schema, and the values in each row refer to column names for the most part. These column names can vary between tables. Here are some examples:
table_name | monitor_flag | created_col | modified_col | ingested_col | dbt_runtime_col | expected_max_delay_minutes |
---|---|---|---|---|---|---|
lead | true | created_at | modified_at | loaded_at | dbt_utc_runtime | 1440 |
owner | true | created_at | modified_at | loaded_at | dbt_utc_runtime | 1440 |
customer | true | src_created_at | src_modified_at | loaded_at | dbt_utc_runtime | 1440 |
Ultimately this will query each table for the latest freshness stats on a daily basis, something like this:
select
'{{ table }}' as table_name,
current_date as report_date,
count(*) as total_row_count,
sum(case when date({{ created_col }}) = current_date then 1 else 0 end) as created_today,
{% if modified_col != '' %}
sum(case when date({{ modified_col }}) = current_date and {{ modified_col }} != {{ created_col }} then 1 else 0 end) as modified_only_today,
{% else %}
0 as modified_only_today,
{% endif %}
max({{ ingested_col }}) as max_ingested_at,
max({{ created_col }}) as max_created_at,
{% if modified_col != '' %}
max({{ modified_col }}) as max_modified_at,
{% else %}
null as max_modified_at,
{% endif %}
from {{ table }}
However for now I would be happy to see all values in a column.
What I’ve already tried
with config as (
select * from {{ ref('monitoring__audit_config') }}
)
{% set tables = dbt_utils.get_column_values(table=ref('monitoring__audit_config'), column='table_name') %}
{% set created_cols = dbt_utils.get_column_values(table=ref('monitoring__audit_config'), column='created_col') %}
{% set modified_cols = dbt_utils.get_column_values(table=ref('monitoring__audit_config'), column='modified_col') %}
{% for item in modified_cols %}
{% do(log(item, info=True))%}
{% endfor %}
select * from config
Currently, do(log…) outputs modified_at
, src_modified_at
. If I can get it to output modified_at
, modified_at
, src_modified_at
then I’m confident the whole thing will work.
Can anyone help?