I have a multi-tenant database that is organized where each tenant’s data is stored in separate schemas. Each tenant has a type associated with it and there are discreet number of tenant types. Within each schema, raw data is landed inside the database and the structure of that raw data is dependent on the tenant type.
For example, tenant 1 may have tables tenant_1_table_1
and tenant_1_table_2
, and tenant 2 may have tables tenant_2_table1
and tenant_2_table_2
. While tenant_1_table_1
and tenant_2_table_1
may be similar they have different field names but can be remapped to represent similar data. Likewise tenant_1_table_2
and tenant_2_table_2
are similar but with different fields and can be mapped to become similar entities.
Ideally I’d like to transform the tables of tenant 1 and tenant 2 into more generic table_1
and table_2
for both tenant 1 and tenant 2. Extend this example to many tens of tenant types and hundreds of different tenants. To achieve this I’d like to apply a different set of transforms to tenants in order to reach table_1
and table_2
.
How would one go about structuring such a dbt project while minimizing duplication of models? Once all the data is in the generic tables the downstream structure is easy but I don’t see a good strategy for conditionally applying certain models to certain tenants to get a unified data layout.
Some ideas that I’ve thought of:
- Create separate dbt projects per tenant type and symlink in the common models into each project
- Write macros that will dynamically inject source models into the common models
- Use variables somehow to redirect
ref
based off of tenant type
Are there best practices on how to approach such a problem?