Creating an ERD for your dbt project

While the auto-generated documentation for your dbt project ships with a DAG view, sometimes, you really want to see an Entity Relation Diagram (ERD) of your project instead.

DAG of project:
The DAG shows the lineage of data.

ERD of project
An ERD shows how the tables relate to each other

At present, there’s no built-in way to do this in dbt, but we have an open issue here for the feature request.

So, how can you create an ERD outside of dbt?

Code libraries:

  • SchemaSpy: “Tangentially related for transactional DBs — highly recommend this project for Postgres, MySQL, and other relational DBs to auto generate navigable ERDs to a static website. I run this once a quarter on our Postgres DB and publish to an S3 bucket.” — Sean McIntyre
  • ERAlchemy (thanks @abelsonlive)

SaaS tools:

  • dbdiagram: Write relationships in code, that then get turned into a diagram.
  • SqlDBM: “SqlDBM is excellent and has reverse engineering from numerous databases, including Snowflake” — @gordonwong)

UI SaaS tools (i.e. manually drawing these diagrams)

A few words of warning though:

  1. ERD generators may not work with your analytics database. As @josh explains:

I think a lot of these tools work fairly well if the PK/FK relationships are stated in the database. And in a well-engineered OLTP system this is normally done. It makes a lot less sense to put the relationships in a Snowflake or a Redshift, because for the most part they are not enforced and it is not actually possible to enforce them. So it’s not nearly as common to do so. But the downside of that is that the tools like ERD drawers and BI tools have a lot harder time than deducing the table relationships.

  1. Separating your documentation from your codebase means one will always lag behind the other. As @Andrea writes:

I did make ERDs with Lucidchart for the data mart - didn’t add it to dbt but to the self-service how-to documentation I created in the BI tool we were using. I only did it for 5 rock solid essential time tested star schema tables that I knew I wouldn’t have to worry about editing a lot. Everything people are saying about maintaining documentation divorced from code absolutely applies here - check out this post on Locally Optimistic about “Why you don’t need a data dictionary” - it delves into some of the problems of documenting data.

If you have a tool you use for ERDs, feel free to add it in the comments! I also suspect there’s a way to parse the manifest.json file and use a language like dbdiagram’s database markup language to turn the relationships in your project into entity-relationships!

10 Likes

I’ve been working with more folks who use Collibra and Alation. How they are creating ERDs is exactly what Claire referred to, by using the manifest.json file. The field you want is parent_map which provides a list of the nodes (the models) and the nodes that they depend on.

Snowflake users should be defining PKs on tables where possible, even if these aren’t enforced it’s a good practice as helps other users (and, as mentioned, various data tools) interpret your data model.

We use Oracel SQL Data Modeler with our snowflake star-schema designs. I actually do go through the painful process of documenting all columns, tables and even relationships (constraints) even though they are “documentation” as far as Snowflake is concerned. My main motivation is to retain our metadata. Modeling tools (and even tools like dbt) come and go but metadata is forever (sort of). I would say that a good 50% of the comments on our columns/tables today was originally part of our early LAMP installation from MySQL. Lots of reverse-engineering/forward-engineering has kept our documentation intact even as we migrated databases, modeling tools and integration platforms. The Information Schema is at least a (sort of) standard that can preserver your work.

:wave: I (quickly) put together a ruby script to generate a diagram on dbdiagram.io. It pulls data from:

  • schema.yml to get models and associations (via tests.relationships.to)
  • catalog.json to get columns.

It’s available as a gist: Generate an ERD via dbdiagram.io from a dbt project. · GitHub.

5 Likes

I haven’t tested this out, but conceptually this is very cool!

I’ve similarly created a PowerShell script that pulls the data from manifest.json and outputs a PlantUML diagram: Converts dbt manifest.json to PlantUML ERD diagram. It's not pretty but it's readable. · GitHub

How would you do the “pull data from schema.yml to get models and associations” part in the case where there are multiple levels of model property .yml files? Looks like your script assumes it’s stored in a single schema.yml file

Below quote from: Model properties | dbt Developer Hub

You can name these files whatever_you_want.yml , and nest them arbitrarily deeply in subfolders within the models/ directory.

Hey folks, I just published CLI for generating ERD-as-code DBML from manifest json:

Check it out!
Feel free to let me know for any questions or requests.
Cheers