Your dbt Project Checklist

Since this post was published, the Fishtown Analytics Professional Services team has now completed over a dozen audits (!) and grown to six (and counting!) analytics engineers working with clients to optimize their dbt projects. In the spirit of keeping our knowledge up to date and available to the community, we wanted to add a few more bullet points here to give some additional color on the things we look for in a dbt project. Thanks for the big head start @amy!

:white_check_mark: dbt_project.yml

  • Are you utilizing tags in your project?
    • The majority of your project’s models should be untagged. Use tags for models and tests that fall out of the norm with how you want to interact with them. For example, tagging ‘nightly’ models makes sense, but also tagging all your non-nightly models as ‘hourly’ is unnecessary - you can simply exclude the nightly models!
    • Check to see if a node selector is a good option here instead of tags.
    • Are you tagging individual models in config blocks?
      • You can use folder selectors in many cases to eliminate over tagging of every model in a folder.
  • Are you using YAML selectors?
    • These enable intricate, layered model selection and can eliminate complicated tagging mechanisms and improve the legibility of the project configuration

Useful Links

:white_check_mark: DAG Auditing

Note: diagrams in this section show what NOT to do!

  • Does your DAG have any common modeling pitfalls?
    • Are there any direct joins from sources into an intermediate model?

      • All sources should have a corresponding staging model to clean and standardize the data structure. They should not look like the image below.
    • Do sources join directly together?

      • All sources should have a corresponding staging model to clean and standardize the data structure. They should not look like the image below.
    • Are there any rejoining of upstream concepts?

      • This may indicate:
        • a model may need to be expanded so all the necessary data is available downstream
        • a new intermediate model is necessary to join the concepts for use in both places
    • Are there any “bending connections”?

      • Are models in the same layer dependent on each other?
      • This may indicate a change in naming is necessary, or the model should reference further upstream models
    • Are there model fan outs of intermediate/dimension/fact models?

      • This might indicate some transformations should move to the BI layer, or transformations should be moved upstream
      • Your dbt project needs a defined end point!
    • Is there repeated logic found in multiple models?

      • This indicates an opportunity to move logic into upstream models or create specific intermediate models to make that logic reusable
      • One common place to look for this is complex join logic. For example, if you’re checking multiple fields for certain specific values in a join, these can likely be condensed into a single field in an upstream model to create a clean, simple join.

Thanks to @ChristineBerger for her DAG diagrams!

Useful links

We’ll keep this post updated as we continue to refine our best practices! Happy modeling!

4 Likes