Dynamic dbt models generation based on different versions of source tables

lancos · June 5, 2023, 2:35pm

The problem I’m having

I am dealing with a scenario where I need to union multiple source tables which have similar structures, but slight variations across different versions and identifiers. The names of these tables follow a specific pattern: they start with a fixed prefix, followed by a version number, and then a unique identifier. The version number is crucial as the schema of the tables varies slightly from version to version.

For example, the tables might be named tablePrefix_1_0_0_identifierA, tablePrefix_2_0_0_identifierA, tablePrefix_1_0_0_identifierB, etc. Note that some versions might not be present for all identifiers.

The context of why I’m trying to do this

These source tables are versioned and tagged with unique identifiers. My goal is to generate a view for each unique identifier, unioning all the different versions pertaining to that specific identifier.

What I’ve already tried

I’ve created a dbt macro that generates a single model which unions all the source tables based on the naming pattern. Here is an example:

Some example code or error messages

This is my code for unioning versions (it may not be perfect and cover all scenarios yet).

{% macro generate_struct_clause(column) -%}
  STRUCT<{%- for field in column.fields -%}{{ field.name }} {{ field.type }}{%- if not loop.last %}, {% endif %}{%- endfor -%}>
{%- endmacro %}

{% macro generate_select_clause(column, table_info) -%}
  {%- set table_column = table_info | selectattr("name", "equalto", column.name) | first -%}
  {%- if table_column -%}
    {%- if table_column.dtype != column.type -%}
      {{ "CAST(" ~ column.name ~ " AS " ~ column.type ~ ") AS " ~ column.name }}
    {%- else -%}
      {{ column.name }}
    {%- endif -%}
  {%- else -%}
    {%- if column.type == "RECORD" -%}
      {{ "CAST(NULL AS " ~ generate_struct_clause(column) ~ ") AS " ~ column.name }}
    {%- else -%}
      {{ "NULL AS " ~ column.name }}
    {%- endif -%}
  {%- endif -%}
{%- endmacro -%}

{% macro generate_union_query(target_schema, table_prefix, final_schema) -%}
  {%- set union_parts = [] -%}
  {%- set tables = dbt_utils.get_relations_by_prefix(database='my-project', schema=target_schema, prefix=table_prefix) -%}
  {%- for table in tables -%}
    {%- set table_info = adapter.get_columns_in_relation(table) -%}
    {%- set column_parts = [] -%}
    {%- for column in final_schema -%}
      {%- do column_parts.append(generate_select_clause(column, table_info)) -%}
    {%- endfor -%}
    {%- set table_path = table.database ~ "." ~ table.schema ~ "." ~ table.identifier -%}
    {%- set select_query = "SELECT\n" ~ column_parts|join(',\n') ~ "\nFROM `" ~ table_path ~ "`\n" -%}
    {%- do union_parts.append(select_query) -%}
  {%- endfor -%}
  {{- union_parts|join('UNION ALL\n') -}}
{%- endmacro -%}

However, creating a separate model for each unique identifier dynamically is a challenge to me.

Questions

What’s the best way to generate multiple models or views based on dynamic parameters (unique identifiers in my case)?
If the traditional dbt workflow can’t support this, are there any workarounds or alternative methods you would suggest?
Is my current approach to handle different table versions appropriate? If not, could you provide suggestions to improve this?

bakerbryce · June 5, 2023, 5:57pm

You may be able to use a combination of dbt_utils’ macros to achieve this, namely get_relations_by_pattern and union_relations.

lancos · June 5, 2023, 6:09pm

Thank you for your suggestion. I have indeed been using the get_relations_by_pattern and union_relations macros from dbt_utils to address this (please take a look). I used a custom macro as
I’ve found that union_relations can encounter issues when dealing with a struct in a struct.

My end goal is to have a separate view for each site id. I am aiming to avoid the creation of individual files for each identifier. Given the potential volume of identifiers and considering each tenant has multiple environments (DEV, SIT, PROD, etc.), this could result in an overwhelming number of files. Unfortunately, I’ve not found a solution to that.

Topic		Replies	Views
Generating multiple views from one model Help bigquery	0	1701	June 7, 2023
Dynamic Column Generation and Incremental Updates Help	2	237	December 2, 2024
Using bigquery wildcard tables with source macro Help bigquery	1	254	September 3, 2024
Regarding dbt model versioning Help	0	973	October 13, 2023
dbt support for schema-versioned database objects? Help snapshots , best-practice , custom-schema	1	1625	January 17, 2023