Good afternoon. We are a team of data engineers working with a fairly large amount of dbt models housed in Snowflake. We have recently revamped our schema.yml
files to include every model in our repo and replaced a monolithic schema.yml with per directory schema.yml files as recommended here. We used the generate_model_yaml codegen macro heavily to build these files. Now that it is time to maintain these schema.yml
files, we are curious as to what the best practice is to keep these updated, considering:
- Multiple engineers are regularly adding / removing models
- Multiple engineers are regularly adding / removing columns or changing their type
- Multiple engineers adding descriptions or tags to models (which would be overwritten rerunning the macro)
None if this is handled automatically of course, so it opens up the potential for human error where an engineer might forget to update these schema.yml
files. Additionally, when the macro is run, it pulls data from Snowflake to generate the yml, requiring us to run the model first if we would like to use the macro for this in some way. This should be okay as we generally would want to test the model first anyways but is worth mentioning.
Is there some kind of PR or precommit check we can do to ensure any changes made in the code are reflected in the schema.yml?
How do you ensure your schema.yml
files are accurate and up to date?