Dynamic models specific in config tables

Hi all!

Any help you can give on the below would be gratefully received.

We have the following challenge:

  • We have many customers for a platform we’re building that ingests data using dbt from various sources
  • The bulk of the logic in the dbt flow that process this incoming data is the same for all customers
  • For some models however, a different approach needs to be configured per customer. Eg. Both customers use a table called “Accounts” to hold customer data, but customer A uses a standard field “Industry” to hold the customer segment, whilst customer B uses a custom field “Segment_C”.
  • We break the logic for how to capture “customer_segment” into two separate models (one for customer A and one for customer B) and then ask dbt to use the correct model when it gets to that part of the flow.
  • The “which is the correct model” config lives in a table in our data warehouse. We want to have dbt read the configuration for which model to correctly include for customers A and B whilst retaining the rest of the flow identical for both customers

Challenge:
– dbt won’t let us dynamically insert “–depends on comments” in models, which breaks our approach

Question:

  • Is there a way to add --depends on comments for models in some kind of pre-hook to get around this?
  • Is there a better way to deal with dynamic model requirements?

Many thanks!

Interesting question! Is the (very cut down) DAG that you’re working with something like this?

If not, can you let me know what it actually looks like? You can use https://mermaid.live to generate a diagram, this is the template I used:

graph LR
    acc1[source_accounts_1] --> ta1(transformed_accounts_1)
    acc2[source_accounts_2] --> ta2(transformed_accounts_2)
    ta1 --> seg(customer_segment)
    ta2 --> seg

It’s not possible to embed arbitrary depends_ons earlier in the flow, as the DAG is generated before any of those things are rendered.

One idea I just had (pretty sure it’ll technically work, but might not be practical for you) would be to put both customer A and B’s depends_on into each other’s files, i.e.

--depends_on: {{ ref('transformed_accounts_1') }}
--depends_on: {{ ref('transformed_accounts_2') }}

select * 
from {{ determine_correct_table() }}
where ... 

As I’ve been writing this, I’m realising my example DAG above must be completely wrong, but I’m not sure what it should look like. Hopefully this is a jumping off point for you though; if you can give some more real-world context I can try to be more helpful