Interpretation of the intermediate layer

We are migrating from a SQL Server database project to dbt, and are in the process of establishing naming conventions and folder structure for our new dbt project.

We have read the best practice guidelines for project structure, and we have some doubts about how to think of the intermediate layer:

  • Is the intermediate layer meant for reusable building blocks across data pipelines?
  • Or just for technical steps which are not useful outside of the specific context where they were implemented?

Or is it aiming to be both?

We have a very clear use case for the second interpretation, as we are migrating from stored procedures which sometimes contain many temp tables. These have been implemented as temp tables and not CTEs for performance reasons, so intermediate models materialized as tables is perfect: The “int_” prefix to the name will place them somewhere else in the database, so they do not end up cluttering the area reserved for marts. (We are aiming to use only one database schema.)

More generally, it seems useful to define certain models as intermediate in order to communicate to end users that these have a different “contract” than marts:

  • It is not meant for reuse
  • It might not be stable and can change without notice

And, of course, to keep the database tidy.

However, this (saying that intermediate models are not meant for reuse) seems to be in conflict with the first interpretation above.


So, we know that we want some form of intermediate layer. The question is how much we should put in intermediate, versus in marts. One way of reading the best practice guidelines, is to say that all models that represent intermediate steps, i.e. which are not the end result of a data pipeline, should be in the intermediate layer. This is in line with the first interpretation above.

I have some problems with this way of using the intermediate layer. Here is some context before my reasoning:

  • The only end users who have read access to our database, are developers from other teams.
  • We often build data pipelines where the end result is an aggregated dataset to be used in a PBI report, but the unaggregated datasets we build along the way may be useful sources for these developers.
  • We do not always know which datasets will be useful for others, but it is a business strategy that our team should build more data products for consumption by other teams in the future.

Defining a model as intermediate must serve some purpose. For the temp table use case I described above, the advantage is clear; we do not even want the other teams to have to look at them.

However, if we define all of our middle steps as intermediate models, we would end up including datasets which might be useful for consumption by other teams. I can’t see how it would benefit them that the models they might want to use are defined as intermediates. It seems more appropriate to treat such models as explicit data products (i.e. place them in marts).

Furthermore, if we want to reuse a model in more than one of the data pipelines in our dbt project, this suggests that the model may also be useful for other teams. This argues that models that are reused across our data pipelines should always be defined as marts.


Based on this, we are currently considering the following definition:

  • Intermediate: models that exist only to make a specific transformation work
  • Marts: models that represent a meaningful dataset with potential value for reuse

We would still allow intermediate models to be reused within the same pipeline, but as soon as we want to reuse it somewhere else, we would promote it to marts. Our rule of thumb would be: “When in doubt, put it in marts.”

Does this interpretation make sense, or are we missing an important aspect of how intermediate models are intended to be used?

In particular:

  • Is it a mistake to avoid using the intermediate layer for reusable building blocks?
  • How do others draw the line between “internal step” and “dataset”?

Would especially appreciate input from teams where the main consumers are other developers rather than business users.