Help with design pattern for code/type/lookup data

ConnorHz · September 8, 2025, 2:23pm

Hey dbt Community,

We are working to stage some data where a lot of the columns contain codes that lookup to values either in other tables (primary key/foreign key) or to values that exist in documentation (for these codes, we copy all the code value pairs into a CSV and load them as a seed). Our initial strategy was to bring in the code values via base models in order to translate them directly in the staging models. The problem with this is, if we every wanted to left join to all possible values, we’d have to use the base model outside of a staging model which rubs me the wrong way.

Have any of you developed any best practices or design patterns around code/lookup tables? Right now we’re leaning towards creating an exception to the “no joins in staging models” rule just for these types of situations but I’m curious what you all have done.

Thanks!

Connor

edit: Found this community post after submitting (guess I wasn’t using the correct terms during my initial search). Looks like the suggested method is using base models but that still doesn’t account for the “left joining to all possible values” problem.

ConnorHz · September 8, 2025, 6:11pm

@joellabes You still active in these streets?

a_slack_user · September 8, 2025, 9:07pm

My advice here is to not follow convention for conventions sake. If your business/data models require (in your opinion) a specific layout that may not be the general best practice, doesn’t mean it isn’t the best practice for you.

We have joins in our src layers (analogous with the staging layer in your scenario) for making sure that all of our primary keys are aligned and correct to unblock redundant joins etc down the line.

_{Note: @Cooper White originally posted this reply in Slack. It might not have transferred perfectly.}

ConnorHz · September 8, 2025, 9:24pm

Thanks Cooper. That’s what we’re leaning towards. Have you experienced any downstream inefficiencies with having these joins so early in the DAG? Do you materialize your src layer as Views? Tables? We don’t mind straying from convention but we have less appetite to do so at the cost of performance

a_slack_user · September 8, 2025, 10:59pm

Really depends on your volume and the kind of join to be honest.

We have some source models that are just views doing casting and renaming, we have some that are full load tables doing a join or two, and we have some set up as incremental.

Best way to explore is just to give it a go in my opinion

_{Note: @Cooper White originally posted this reply in Slack. It might not have transferred perfectly.}

Topic		Replies	Views
When in the DAG should denormalization of codes into string occur? In-Depth Discussions best-practice , snowflake , dbt-cloud	2	1319	June 16, 2023
Best practice splitting SQL into staging, intermediate and marts layers and naming conventions Help	1	7022	January 3, 2024
Strategy for dbt modeling when using Looker Archive	2	5319	June 3, 2021
Unioning identically-structured data sources Show and Tell jinja , best-practice	12	37859	December 5, 2023
How we (used to) structure our dbt projects Archive	22	199729	September 7, 2022

Help with design pattern for code/type/lookup data

Related topics