Curious how analytics teams actually manage column-level documentation in practice.
Where do descriptions and business definitions usually live?
dbt docs, a catalog, spreadsheets, somewhere else?
And if someone had to document a few hundred columns, what workflow would they realistically use?
The teams I’ve seen keep sane usually make the dbt YAML the source of truth, then get disciplined about what actually deserves column-level docs.
A few patterns that seem to hold up:
- keep business definitions close to the model in schema.yml, not in a separate spreadsheet,
- only require detailed docs for exposed or high-traffic columns, not every intermediate field,
- add a PR check so new columns do not land undocumented by accident,
- use templated wording for common audit fields so people are not rewriting the same definition 40 times.
For a few hundred columns, I would not try to document them all in one pass. I’d do it by domain and start with the tables people actually query. Otherwise you end up with a lot of stale prose nobody trusts.
If a catalog tool exists, I’d treat it as the display layer, not the authoring layer.
Honestly, depending on the type of organization you’re talking about, Data Contracts can be an option that solves the problem before something happens.
But as I said, it depends on the size of the organization you’ll have because it will require dedication to documentation, especially for those who handle and work with the data. But as you said, in your case with hundreds of columns, I think it’s better than just documenting something that could already be useful beyond just instruction.