I think dbt projects have valueable metadata collected in the form of manifest.json for example, the tags, owners etc
Is it worth thinking about ingesting this into the data warehouse beside the output of the transformations?
With the new meta blocks in the documentation.yml files - absolutely, the manifest.json file becomes a rich source of data (particularly for metadata management as you imply)!
I asked on the Slack about any future plans for this. There are none so far. There is some historical discussion about this v0.16 meta block feature here https://github.com/fishtown-analytics/dbt/issues/1362
So it looks like I’ll do it myself (which is fun too!). My initial thougt on it so far, is that it’s a json file that can be generated from the build process. So we can then copy it to the data warehouse’s stage area (S3 for Snowflake) as part of the build pipeline (we use containers but could be a Jenkins job).
Snowflake works natively with JSON, and dbt models can be created directly from the VARIANT data type by using Snowflake’s VARIANT data type select syntax.
Another thought…the JSON is pretty complex and deeply nested (I gues that’s relative!). So pre-processing it first might be smarter. I guess the balance is where is it easier to achieve this? Using Snowflakes JSON tools, or something like Python.
This is great idea. I was hoping that metadata such as owner would be displayed on the web documentation too and be searchable (similar to tags), but having this in the table would be very useful too.