Snapshot testing with fixed data source

Hello all!

I wanted to take some time to describe the small testing framework we’ve put together for our dbt project.

The main components are:

  1. source schemas are loaded from sql files (test/schemas/SCHEMA.sql)
  2. source data are loaded from yaml files (test/fixtures/SCHEMA/TABLE.yml)
  3. the target data warehouse is built into postgres
  4. we run a handful of unit tests in ruby
  5. we capture snapshots of the resulting target data to yaml files that we check in git (test/snapshots/SCHEMA/TABLE.yml).

Capturing snapshots of the resulting data warehouse has sped up the development and code review process quite a lot and made us much more confident when making changes. We can verify that the code change is having the expected effect on the data warehouse and does not have any unintended side effects.

Here is for example a diff when changing the datatype of a column and adding new columns:

I’m happy to go into more detail if you have any questions or comments!