Dynamic dbt models generation based on different versions of source tables

The problem I’m having

I am dealing with a scenario where I need to union multiple source tables which have similar structures, but slight variations across different versions and identifiers. The names of these tables follow a specific pattern: they start with a fixed prefix, followed by a version number, and then a unique identifier. The version number is crucial as the schema of the tables varies slightly from version to version.

For example, the tables might be named tablePrefix_1_0_0_identifierA, tablePrefix_2_0_0_identifierA, tablePrefix_1_0_0_identifierB, etc. Note that some versions might not be present for all identifiers.

The context of why I’m trying to do this

These source tables are versioned and tagged with unique identifiers. My goal is to generate a view for each unique identifier, unioning all the different versions pertaining to that specific identifier.

What I’ve already tried

I’ve created a dbt macro that generates a single model which unions all the source tables based on the naming pattern. Here is an example:

Some example code or error messages

This is my code for unioning versions (it may not be perfect and cover all scenarios yet).

{% macro generate_struct_clause(column) -%}
  STRUCT<{%- for field in column.fields -%}{{ field.name }} {{ field.type }}{%- if not loop.last %}, {% endif %}{%- endfor -%}>
{%- endmacro %}

{% macro generate_select_clause(column, table_info) -%}
  {%- set table_column = table_info | selectattr("name", "equalto", column.name) | first -%}
  {%- if table_column -%}
    {%- if table_column.dtype != column.type -%}
      {{ "CAST(" ~ column.name ~ " AS " ~ column.type ~ ") AS " ~ column.name }}
    {%- else -%}
      {{ column.name }}
    {%- endif -%}
  {%- else -%}
    {%- if column.type == "RECORD" -%}
      {{ "CAST(NULL AS " ~ generate_struct_clause(column) ~ ") AS " ~ column.name }}
    {%- else -%}
      {{ "NULL AS " ~ column.name }}
    {%- endif -%}
  {%- endif -%}
{%- endmacro -%}

{% macro generate_union_query(target_schema, table_prefix, final_schema) -%}
  {%- set union_parts = [] -%}
  {%- set tables = dbt_utils.get_relations_by_prefix(database='my-project', schema=target_schema, prefix=table_prefix) -%}
  {%- for table in tables -%}
    {%- set table_info = adapter.get_columns_in_relation(table) -%}
    {%- set column_parts = [] -%}
    {%- for column in final_schema -%}
      {%- do column_parts.append(generate_select_clause(column, table_info)) -%}
    {%- endfor -%}
    {%- set table_path = table.database ~ "." ~ table.schema ~ "." ~ table.identifier -%}
    {%- set select_query = "SELECT\n" ~ column_parts|join(',\n') ~ "\nFROM `" ~ table_path ~ "`\n" -%}
    {%- do union_parts.append(select_query) -%}
  {%- endfor -%}
  {{- union_parts|join('UNION ALL\n') -}}
{%- endmacro -%}

However, creating a separate model for each unique identifier dynamically is a challenge to me.

Questions

  • What’s the best way to generate multiple models or views based on dynamic parameters (unique identifiers in my case)?
  • If the traditional dbt workflow can’t support this, are there any workarounds or alternative methods you would suggest?
  • Is my current approach to handle different table versions appropriate? If not, could you provide suggestions to improve this?

You may be able to use a combination of dbt_utils’ macros to achieve this, namely get_relations_by_pattern and union_relations.

Thank you for your suggestion. I have indeed been using the get_relations_by_pattern and union_relations macros from dbt_utils to address this (please take a look). I used a custom macro as
I’ve found that union_relations can encounter issues when dealing with a struct in a struct.

My end goal is to have a separate view for each site id. I am aiming to avoid the creation of individual files for each identifier. Given the potential volume of identifiers and considering each tenant has multiple environments (DEV, SIT, PROD, etc.), this could result in an overwhelming number of files. Unfortunately, I’ve not found a solution to that.