Your dbt Project Checklist

dave.connors · February 5, 2021, 10:01pm

Since this post was published, the Fishtown Analytics Professional Services team has now completed over a dozen audits (!) and grown to six (and counting!) analytics engineers working with clients to optimize their dbt projects. In the spirit of keeping our knowledge up to date and available to the community, we wanted to add a few more bullet points here to give some additional color on the things we look for in a dbt project. Thanks for the big head start @amy!

dbt_project.yml

Are you utilizing tags in your project?
- The majority of your project’s models should be untagged. Use tags for models and tests that fall out of the norm with how you want to interact with them. For example, tagging ‘nightly’ models makes sense, but also tagging all your non-nightly models as ‘hourly’ is unnecessary - you can simply exclude the nightly models!
- Check to see if a node selector is a good option here instead of tags.
- Are you tagging individual models in config blocks?
  - You can use folder selectors in many cases to eliminate over tagging of every model in a folder.
Are you using YAML selectors?
- These enable intricate, layered model selection and can eliminate complicated tagging mechanisms and improve the legibility of the project configuration

Useful Links

YAML selectors

DAG Auditing

Note: diagrams in this section show what NOT to do!

Does your DAG have any common modeling pitfalls?
- Are there any direct joins from sources into an intermediate model?
  - All sources should have a corresponding staging model to clean and standardize the data structure. They should not look like the image below.
    
    image1038×503 140 KB
- Do sources join directly together?
  - All sources should have a corresponding staging model to clean and standardize the data structure. They should not look like the image below.
    
    image1051×448 120 KB
- Are there any rejoining of upstream concepts?
  - This may indicate:
    - a model may need to be expanded so all the necessary data is available downstream
    - a new intermediate model is necessary to join the concepts for use in both places
      
      image1748×462 183 KB
- Are there any “bending connections”?
  - Are models in the same layer dependent on each other?
  - This may indicate a change in naming is necessary, or the model should reference further upstream models
    
    image1827×550 177 KB
- Are there model fan outs of intermediate/dimension/fact models?
  - This might indicate some transformations should move to the BI layer, or transformations should be moved upstream
  - Your dbt project needs a defined end point!
    
    image1830×396 130 KB
- Is there repeated logic found in multiple models?
  - This indicates an opportunity to move logic into upstream models or create specific intermediate models to make that logic reusable
  - One common place to look for this is complex join logic. For example, if you’re checking multiple fields for certain specific values in a join, these can likely be condensed into a single field in an upstream model to create a clean, simple join.

Thanks to @ChristineBerger for her DAG diagrams!

Useful links

We’ll keep this post updated as we continue to refine our best practices! Happy modeling!

Topic		Replies	Views
How we (used to) structure our dbt projects Archive	22	195222	September 7, 2022
[Packages] dbt Labs in the dbt Hub Archive	6	5787	October 6, 2021
Analyzing Fishtown's dbt project performance with artifacts Archive	7	12821	March 31, 2021
Resources written by community members Archive	1	5581	March 25, 2019
dbt source freshness at a project level Help	0	657	November 16, 2023

Your dbt Project Checklist

dbt_project.yml

DAG Auditing

Related topics