Best Practices for Managing Staging Models in Large-Scale dbt Projects ??

Hey everyone,

I have been working with dbt for the past year and absolutely love how it streamlines analytics engineering. As our team scales,.. our dbt project is growing rapidly—especially the number of staging models.

Right now.., our staging layer feels a bit chaotic. We follow some naming conventions, but managing dependencies and keeping models organized is becoming a challenge. I am curious how others are structuring their staging models in larger projects.

Some specific questions:

Do you recommend splitting staging models across multiple folders by source or domain: ??

How do you handle shared transformations (e.g., cleaning date fields) across multiple staging models: ??

Any advice on keeping ref() dependencies clean and manageable: ?? I have also searched on the web for the solution related to my query and got this blog https://www.getdbt.com/blog/staging-models-best-practices-and-limiting-view-runs-sql-training-in-bangalore but couldn’t get enough help.

Would love to hear what’s worked (or not worked) for others. Open to any tips, tools, or docs that helped you keep things sane as your dbt project grew.

Thanks in advance !!

With Regards,
Marcelo

Hi Marcelo! (Hola tocayo!)

Do you recommend splitting staging models across multiple folders by source or domain??

Absolutely! In many projects I have worked on, we always opt for splitting by source.

How do you handle shared transformations (e.g., cleaning date fields) across multiple staging models??

We handle it model by model. Nothing great to suggest on my side on this point.

Any advice on keeping ref() dependencies clean and manageable:

I’m not sure I’m getting you here. Sometimes you have complex and/or many business rules, so you will have many references and you can’t avoid that complexity. Just keep one ref for each model and at the beginning, in the “importing” area. Let me know about any specific issues so I can try to help you more.

Regarding articles, these can probably lead you to interesting insights. Don’t take them as THE rules, I found myself and my colleagues having counter examples for many recommendations you will find there. Just pick them when they work for you or use them as inspiration for better customized practices:

Hi Marcelo -
On this>

We use macros for consistently cleaning/transforming certain types of data.

Our dbt project groups things in folder by project layers (stage, intermediate, marts) and sub folder by functional area. In some cases where we wanted to differentiate within subfolder at runtime we’ve added tags. Example - 99% of our stage models in functional area X need to run daily, but 3 models require an adhoc datastep to occur 1x a year, so we’ve tagged them to allow them to be excluded or run exclusively via selector.

I’ve found the dbt roadmap documentation information helpful (Ex.) for understanding new/upcoming features.

That custom selector skip_views_but_test_views you linked is also a cool idea. thanks for sharing.