Migrating to schema.yml v2


#1

dbt v0.11.0 introduces a new syntax for schema.yml files. This syntax is more flexible and expressive than the v1 spec, making it possible to add documentation strings for columns and models. Documentation for this new syntax can be found here.

Migrating from the v1 syntax to the v2 syntax manually can be tedious, so we’ve created a helpful script to automatically migrate your schema.yml files. Follow along for instructions on using the script.

Pre-Requisites

This guide assumes that you’re using macOS or Linux.
The script below requires that the PyYAML module is installed in your python environment, eg. with:

pip install PyYAML

Migration guide

  1. First, make sure your dbt project is checked into a version control system (like git). Also make sure that there are no uncommitted changes in your repo, as the script below will overwrite your schema.yml files with their v2 representations.

  2. cd to the root of your dbt project (where dbt_project.yml lives)

  3. Download the script:

curl -o upgrade_dbt_schema_tests_v1_to_v2.py https://raw.githubusercontent.com/fishtown-analytics/dbt/development/scripts/upgrade_dbt_schema_tests_v1_to_v2.py
  1. Run the script in dry-run mode. Changes will not be applied, but the script will validate your schema.yml files and report on the files that it would upgrade.
python upgrade_dbt_schema_tests_v1_to_v2.py .
  1. If everything looks good, run with --apply to apply the changes:
python upgrade_dbt_schema_tests_v1_to_v2.py . --apply

After running the last command, a git status should show you that your schema.yml files have been upgraded and new .backup files were created with the previous versions of your schema.yml files.

If everything looks good, then we can clean up the .backup files:

# Be careful with this - it will delete files called schema.yml.backup in your `models` directory
find models -name schema.yml.backup -exec rm {} \;

And that’s it! dbt test should work exactly as it did before, but you should now have v2 schema.yml files. Happy documenting :slight_smile:


#2

Don’t panic if you get an error like this when you try to run the script:

ERROR: 2018-09-06 14:36:18,445: Fatal error during conversion attempt
Traceback (most recent call last):
  File "upgrade_dbt_schema_tests_v1_to_v2.py", line 214, in main
    handle(parsed)
  File "upgrade_dbt_schema_tests_v1_to_v2.py", line 148, in handle
    parsed.extra_complex_tests)
  File "upgrade_dbt_schema_tests_v1_to_v2.py", line 164, in convert_project
    convert_file(filepath, backup, write, extra_complex_tests)
  File "upgrade_dbt_schema_tests_v1_to_v2.py", line 176, in convert_file
    version = initial.get('version', 1)
AttributeError: 'NoneType' object has no attribute 'get'

This is the result of having an empty schema.yml file somewhere in your project. Make it non-empty, or delete it, and the error will go away.


#3

For those of us on Windows, the last step can be performed in PowerShell with:

get-childitem models -include schema.yml.backup -recurse | foreach ($_) {remove-item $_.fullname}

You can use the “whatif” argument to check the list of files to be deleted without actually deleting them. That would look like:

get-childitem models -include schema.yml.backup -recurse | foreach ($_) {remove-item $_.fullname -whatif}

#4

Also don’t forget to delete the script so that you don’t accidentally check it in to version control:

find models -name upgrade_dbt_schema_tests_v1_to_v2.* -exec rm {} \;