I would like to be able to tag specific columns and have access to the metadata about them, perhaps only for dbt docs at first but I can see this being useful for automated features as well, much like sources and models tags are already useful.
My main use case is to tag PII data. It would be nice to be able to e.g. mark certain columns to be automatically hashed, or generate a warning, or something along those lines. Or even just surface them in documentation. I could see a PII tag acting something like Perl’s taint mode, and any column derived from PII data is tainted with PII unless specifically scrubbed (perhaps via macro call). Then I could get a list of columns that have been PII tagged, and warnings about untagged PII-tainted columns.
Other potential use cases: team ownership of columns in large orgs. Provide hints and metadata for other data governance requirements (e.g. finance, sox compliance).
I’m making this as a pre-proposal because my thoughts are very preliminary and I am vague on implementation requirements, and would love to solicit the community’s feedback.