Preventing name conflicts


#1

I looking for a good way to prevent naming conflicts. I have a 2 features that trigger the same event request_created, and I want to create a single unified view with the union of both. I’ve created separated schemas for each of the features and another for the unified output, but I keep getting asked to rename files to solve the conflict.

My project has the following:

models:
  business_events:
    events:
      materialized: table

    feat_a:
      schema: feat_a

    feat_b:
      schema: feat_b

My folder looks something like this:

...
├── models
│   ├── events
│   │   └── request_created.sql
│   ├── feat_a
│   │   └── a_request_created.sql
│   └── feat_b
│       └── b_request_created.sql
...

I had to prepend the name of the schema to the SQL file to avoid the conflict, and use alias to generate the table name I’ve wanted, i.e. {{ config(alias='request_created') }}.

The solution works, but seems a bit hacky. Is there a better way to avoid this conflict?


#2

Hi @rafael! Interesting question! I think you’re doing everything correctly here, and I can give some additional context for why dbt works the way it does:

  1. Model filename names must be globally unique so that models can be ref'd. If you have two models named request_created.sql (in different directories), then ref('request_created') would be ambiguous! As such, every model needs to have a distinct name, and using a prefix is a pretty good approach to ensuring uniqueness.
  2. Every relation name in the database needs to be unique. If you have two models that are configured to be written to the same schema/identifier (like feat.request_created), then whichever model runs second will “win” and overwrite the first one! Instead, dbt raises an error about duplicated relation names (which it sounds like you saw).

Between schemas and aliases, you should be able to name these models however you’d like! I’d agree with you that the model filename prefix isn’t ideal, but it works pretty well for now.

Happy to answer any questions if you have them, but again, sounds to me like you’re doing everything the right way :slight_smile:


#3

I understand that the request_created is not unique in the dbt context, therefore some type of resolution is needed. But by using alias I do get unique relation names within the schemas, so that’s not the problem.

I think I was expecting ref to take a the schema into consideration. Something like, ref('feat_a', 'request_created') or ref('feat_a.request_created)`.


#4

Yeah, using the schema doesn’t work if you have multiple people collaborating on a dbt project! In my profile, I have my schema set to dbt_drew, my colleague Claire has it set to dbt_claire, and in production, it’s set to analytics.

If the ref function accepted a schema, then dbt projects would no longer work in an environment aware.

If I had code like:

select * from ref('dbt_dbanin.request_created)`

Then the production deployment of dbt would point to my request_created model, which definitely isn’t right! Similarly, if you did:

select * from ref('analytics.request_created)`

then when I ran models locally, I’d actually be using the production version of the request_created table.

All said, dbt works the way it does to facilitate a collaborative workflow in which every team member can build a full version of the dbt project. This is one of the core design decisions in dbt, and it manifests in possibly unexpected ways like this.

To be sure, I think there’s more work that we can do on “namespacing” models. If you have suggestions about how we could do this better, I’d encourage you to create an issue :slight_smile:


#5

I think it would be okay to allow schema referencing. I was trying to figure out a way to pull from models in one schema and write into another schema so I could more neatly organize my data warehouse. I didn’t know you couldn’t, and so I wrote a quick macro yesterday that would set the schema to be different depending on which environment you were in only to realize that it wouldn’t work. I don’t think it’s very different than the concept of custom schemas dbt already kind of supports.

This was the macro I had, I guess I’d technically be able to replace the actually ref function with this… but don’t want to play with fire…

{% macro _ref(table_name, schema_name=none) %}
  
  {% if schema_name is none %}
    {{ ref(table_name) }}
  {% elif target.name == 'prod' %}
    {{ ref(schema_name+'.'+table_name) }}
  {% else %}
    {{ ref(target.schema+'_'+schema_name+'.'+table_name) }}
  {% endif %}
  
{% endmacro %}