Your Essential dbt Project Checklist

If you’ve been using dbt for over a year, your project is out-of-date. This is natural.
New functionalities have been released. Warehouses change. Best practices are updated. Over the last year, I and others on the Fishtown Analytics team have conducted seven audits for clients who have been using dbt for a minimum of 2 months. In every single audit, we found opportunities to:

  1. Improve performance
  2. Improve maintainability
  3. Make it easier for new people to get up-to-speed on the project

This post is the checklist I created to guide our internal work, and I’m sharing it here so you can use it to clean up your own dbt project. Think of this checklist like a Where's Waldo? book: you’ll still have to go out and find him, but with this in hand, you’ll at least know what you’re looking for.

:white_check_mark: dbt_project.yml

  • Project naming conventions
    • What is the name of your project?
      • Did you keep it as ‘my_new_project’ per the init project or renamed it to make sense?
      • Our recommendation is to name it after your company such as ‘fishtown_analytics’.
      • If you have multiple dbt projects, something like ‘fishtown_analytics_marketing’ might make more sense.
  • Do you have unnecessary configurations like materialized: view?
    • By default, dbt models are materialized as “views”. This removes the need to declare any models as views.
    • If all of your models in a folder are tables, define the materialization on the dbt_project.yml file rather than on the model file. This removes clutter from the model file.
  • Do you have a ton of placeholder comments from the init command?
    • This creates unnecessary clutter.
  • Do you use post-hooks to grant permissions to other transformers and BI users?
    • If no, you should! This will ensure that any changes made will be accessible to your collaborators and be utilized on the BI layer.

Useful links:

:white_check_mark: Package Management

  • How up to date are the versions of your dbt Packages?
    • You can check this by looking at your packages.yml file and comparing it to the packages hub page.
  • Do you have the dbt_utils package installed?
    • This is by far our most popular and essential package. The package contains clever macros to improve your dbt Project. Once implemented, you have access to the macros (no need to copy them over to your project).

Useful links

:white_check_mark: Code style

  • Do you have a clearly defined code style?
  • Are you following it strictly?
  • Are you optimizing your SQL?
    • Are you using window functions and aggregations?

Useful Links

:white_check_mark: Project structure

  • If you are using dimensional modeling techniques, do you have staging and marts models?
    • Do they use table prefixes like ‘fct_’ and ‘dim_’?
  • Is the code modular? Is it one transformation per one model?
  • Are you filtering as early as possible?
    • One of the most common mistakes we have found is not filtering or transforming early enough. This causes multiple models downstream to have the same repeated logic (i.e., wet code) and makes updating business logic more cumbersome.
  • Are the CTEs modular with one transformation per CTE?
  • If you have macro files, are you naming them in a way that clearly represent the macro(s) contained in the file?

Useful Links

:white_check_mark: dbt

  • What version of dbt are you on?
    • The further you get away from the latest release, the more likely you are to keep around old bugs and make updating that much harder.
  • What happens when you dbt run?
    • What are your longest-running models?
      • Is it time to reevaluate your modeling strategy?
      • Should the model be incremental?
        • If it’s already incremental, should you adjust your incremental strategy?
    • How long does it take to run the entire dbt project?
    • Does every model run? (This is not a joke.)
      • If not, why?
    • Do you have circular model references?
  • Do you use sources?
    • If so, do you use source freshness tests?
  • Do you use refs and sources for everything?
  • Do you regularly run dbt test as part of your workflow and production jobs?
  • Do you use Jinja & Macros for repeated code?
    • If you do, is the balance met where it’s not being overused to the point code is not readable?
    • Is your Jinja easy to read?
      • Did you place all of your set statements at the top of the model files?
      • Did you format the code for Jinja-readability or just for the compiled SQL?
      • Do you alter your whitespace?
        • Example: {{ this }} and not {{this}}
    • Did you make complex macros as approachable as possible?
      • Way to do this are providing argument names and in-line documentation using {# <insert text> #}
  • If you have incremental models, are they using unique keys and is_incremental() macro?
  • If you have tags, do they make sense? Do they get utilized?

Useful Links

:white_check_mark: Testing & Continuous Integration

  • Do your models have tests?
    • The ideal project has 100% test coverage on all of its models. While there are cases where this doesn’t make sense, our rule of thumb is models should have at least a not_null/unique test on the primary key.
  • What are you testing for? Does it make sense?
  • What are the assumptions you should be testing for?
    • Think about your core business logic as well as your understanding of your sources.
  • Are you using pull requests/other forms of version control?
    • How easy is it to understand what the code change and intention behind the code change do?
  • Do you have mandatory PR reviews before merging code to your dbt project or BI layer?
    • Do you use a PR template?

Useful Links

:white_check_mark: Documentation

  • Do you use documentation?
  • Are there descriptions for each model?
  • Are complex transformations and business logic explained in an easily accessible place?
  • Are your stakeholders using your documentation?
    • If not, why?
  • Do you have a readme and regularly update it?
  • How easy would it be to onboard someone to your project?
  • If you have column-level descriptions, are you using doc blocks?

Useful Links

:white_check_mark: dbt Cloud specifics

  • What dbt version are the jobs?
    • Are the majority of them inheriting from the environment to make upgrading easier?
  • What do your jobs look like? Do they make sense?
  • How are your dbt cloud projects organized?
    • Do you have any unused projects?
  • Have you chosen the most appropriate job for your account level documentation?
  • Are the number of runs syncing up with how often your raw data updates and are viewed?
    • If your data isn’t updating as often as the runs are happening, this is just not doing anything.
  • Do you have a full refresh of the production data?
  • Do you run tests on a periodic basis?
  • What are the longest-running jobs?
  • Do you have a Continuous Integration job? (Github only)

Are you using the IDE and if so, how well?

  • We found that the IDE has assisted in alleviating issues of maintaining the upgraded dbt version.
  • Does dbt cloud have its own user in their warehouse? What is the default warehouse/role?
  • Are you getting notifications for failed jobs? Have you set up the slack notifications?

Useful Links

This is a quick overview of things to think about in your project.
Please comment if there is anything missed or that you have found has led you to success. :slight_smile:

11 Likes