The problem I’m having
I am dealing with a scenario where I need to union multiple source tables which have similar structures, but slight variations across different versions and identifiers. The names of these tables follow a specific pattern: they start with a fixed prefix, followed by a version number, and then a unique identifier. The version number is crucial as the schema of the tables varies slightly from version to version.
For example, the tables might be named tablePrefix_1_0_0_identifierA
, tablePrefix_2_0_0_identifierA
, tablePrefix_1_0_0_identifierB
, etc. Note that some versions might not be present for all identifiers.
The context of why I’m trying to do this
These source tables are versioned and tagged with unique identifiers. My goal is to generate a view for each unique identifier, unioning all the different versions pertaining to that specific identifier.
What I’ve already tried
I’ve created a dbt macro that generates a single model which unions all the source tables based on the naming pattern. Here is an example:
Some example code or error messages
This is my code for unioning versions (it may not be perfect and cover all scenarios yet).
{% macro generate_struct_clause(column) -%}
STRUCT<{%- for field in column.fields -%}{{ field.name }} {{ field.type }}{%- if not loop.last %}, {% endif %}{%- endfor -%}>
{%- endmacro %}
{% macro generate_select_clause(column, table_info) -%}
{%- set table_column = table_info | selectattr("name", "equalto", column.name) | first -%}
{%- if table_column -%}
{%- if table_column.dtype != column.type -%}
{{ "CAST(" ~ column.name ~ " AS " ~ column.type ~ ") AS " ~ column.name }}
{%- else -%}
{{ column.name }}
{%- endif -%}
{%- else -%}
{%- if column.type == "RECORD" -%}
{{ "CAST(NULL AS " ~ generate_struct_clause(column) ~ ") AS " ~ column.name }}
{%- else -%}
{{ "NULL AS " ~ column.name }}
{%- endif -%}
{%- endif -%}
{%- endmacro -%}
{% macro generate_union_query(target_schema, table_prefix, final_schema) -%}
{%- set union_parts = [] -%}
{%- set tables = dbt_utils.get_relations_by_prefix(database='my-project', schema=target_schema, prefix=table_prefix) -%}
{%- for table in tables -%}
{%- set table_info = adapter.get_columns_in_relation(table) -%}
{%- set column_parts = [] -%}
{%- for column in final_schema -%}
{%- do column_parts.append(generate_select_clause(column, table_info)) -%}
{%- endfor -%}
{%- set table_path = table.database ~ "." ~ table.schema ~ "." ~ table.identifier -%}
{%- set select_query = "SELECT\n" ~ column_parts|join(',\n') ~ "\nFROM `" ~ table_path ~ "`\n" -%}
{%- do union_parts.append(select_query) -%}
{%- endfor -%}
{{- union_parts|join('UNION ALL\n') -}}
{%- endmacro -%}
However, creating a separate model for each unique identifier dynamically is a challenge to me.
Questions
- What’s the best way to generate multiple models or views based on dynamic parameters (unique identifiers in my case)?
- If the traditional dbt workflow can’t support this, are there any workarounds or alternative methods you would suggest?
- Is my current approach to handle different table versions appropriate? If not, could you provide suggestions to improve this?