How to balance the need for controlled development of core models with rapid development of peripheral models

michael.dunn · May 31, 2018, 11:06pm

I’m struggling to have two different development ideologies coexist within dbt. On the one hand is my team (BI) who is responsible for developing the core data model for the business. Because those core components have broad impact across the business, development of those components needs to be pretty regimented.

On the other hand, there is a broad periphery of data models that need to be built and maintained by the business analyst group - they’re assigned to specific business units, and given the pace of development in the business overall, they will need/want to iterate more quickly, and are more tolerant of flaws in their models.

So how should we structure dbt models to facilitate both ideals? One thought I had was to have multiple repos (one for the core, and one or more for the data marts the BAs will create). But it’s not clear to me how execution would work in that sort of structure; would I include the data mart package(s) in the core package and dbt run from the core project?

Anyone else dealing with similar challenges?

michael.dunn · June 1, 2018, 6:07pm

And what about development in the data mart package(s)? They’d need to include the core package, so that they could actually run their models to test them, and that would be a circular dependency and dbt just spins around and around forever on dbt deps in that case.

It would seem, therefore that the data mart models should refer to the actual database objects generated by my team’s dbt repo (rather than {{ref()}}, and the data mart package(s) should be independently executed, but this seems antithetical to “the dbt way” of doing things.

tristan · June 8, 2018, 12:46am

Ooh! I’ve done similar things to what you’re suggesting before. I like where your head’s at.

When I develop open source packages, I build the “library” (the thing I want to share with other people) and then a “testing project” (the thing I actually run the code from). This is necessary because in the library, I actually can’t specify things like profile and many variable names, because those things need to be configured by the core project that is including the library. So–I build the shared code in the library, and then write the specific code that allows me to test and run that code (profiles, variable names, etc) in the “testing project” and run everything from there. I never actually push the testing project to any git repos…it’s just for testing. But I come away with code in a library that I can run pretty seamlessly.

This translates to your situation pretty directly, even if it sounds a bit roundabout. Your analysts would be able to clone your “core” project but not push to it. They would have editor permissions on your “marts” project. They would edit the code in marts, which would get pulled in as a dependency in core, and they’d always run from core (which isn’t a problem; they can get the code just not edit it).

Again…I recognize that this seems roundabout but it does work quite nicely in practice. Please let me know if there’s some part of this that I can make more clear.

mehdi.elamine · June 13, 2018, 1:54pm

Thanks @tristan , I think I’m getting the gist of what you’re saying. By any chance is there any (dummy) github repo where one can see this kind of structure?

tristan · June 13, 2018, 2:06pm

This isn’t really something that gets checked into a package, it’s more about a local workflow for how to go about developing packages… You can see packages that have been developed in this workflow in our git org; check out Stripe or Snowplow or Quickbooks or any of them.

mehdi.elamine · June 13, 2018, 2:19pm

Copy that, thanks.

So in your approach, wouldn’t core get re-run each time?

tristan · June 13, 2018, 2:30pm

Assuming you do an entire dbt run, yep! You can of course just choose to run whatever subgraph you like though.

Topic		Replies	Views
dbt Squared: Leveraging dbt Core and dbt Cloud together at scale In-Depth Discussions devblog	2	1670	October 4, 2023
Should I have an organisation wide project (a monorepo) or should each work flow have their own? In-Depth Discussions project-structure , best-practice	2	32665	May 4, 2021
How to configure your dbt repository (one or many)? In-Depth Discussions	11	54437	December 31, 2021
How we (used to) structure our dbt projects Archive	22	196794	September 7, 2022
Getting Started with git Branching Strategies and dbt In-Depth Discussions devblog	6	346	April 17, 2025

How to balance the need for controlled development of core models with rapid development of peripheral models

Related topics