Examples of custom schema tests


#1

I’m really diving into data testing at the moment (thanks to @joshtemple’s Medium post!). By default, dbt comes with 4 schema tests: not null, unique, relationships, and accepted values. Also, dbt allows for custom schema test which lets you define additional schema tests.

I’m interested in how others have defined custom schema tests for their data models. It’d be great to have several examples here to inspire the rest of us in brainstorming data tests that fit our data needs.

For example, I work with North American phone numbers a lot, so I have a custom schema tests to check that phone number fields always contain ten-digit number (TDN) strings:

{# Verifies all elements in field are TDNs. If validation_errors returns >0 rows, test fails. #}

{% macro test_is_tdn(model, column_name) %}

with validation as (
  select
    {{ column_name }} as tdn_field
  from {{ model }}
),

validation_errors as (
  select
    tdn_field
  from validation
  where tdn_field not rlike '^\\d{10}$'
)

select count(*)
from validation_errors

{% endmacro %}

What are some custom data schema tests you’ve developed?


#2

I have two simple modifications on the standard schema tests: not_null_where and unique_together:

{% macro test_not_null_where(model, field, condition) %}

  with exceptions as (
    select
      count(*)

    from
      {{ model }}

    where
      {{ field }} is null
      and {{ condition }}

  )

  select * from exceptions

{% endmacro %}
{% macro test_unique_together(model, fields) %}

  with exceptions as (
    select
      {% for field in fields %}
        {{ field }}{% if not loop.last %},{% endif %}
      {% endfor %}

    from
      {{ model }}

    {{ dbt_utils.group_by(n=fields|length) }}

    having
      count(*) > 1
  )

  select count(*) from exceptions

{% endmacro %}

You use the following syntax to call these tests (v1 schema.yml):

not_null_where:
  - {field: 'field_1', condition: "another_field = 'something'"}
  - {field: 'field_1', condition: "another_field > 0"}
  - {field: 'field_1', condition: "not(field_2 = 'a' and field_3 = 'b'"}
unique_together:
  - {fields: ['field_1', 'field_2']}

#3

We have built a few custom schema tests for ourselves at PageUp, which can be viewed here. Some of these are specific to our use cases but are interesting nonetheless.

  • Equality for range test that two tables have the same data in them, for the rows in the given id range. This is useful for testing against a controlled data set, as it allows you to test specific use cases on a subset of data while having other data in the model that may be used by other tests.
  • Has timestamp and Has version are tests that ensure that certain design patterns we have are enforced. For example, we have (had) a rule that every denormalised table or foreign key in a model should have its own timestamp column, which is then used for incremental updates later. This test ensures that a given column has a matching timestamp column, and that the timestamp column has a value in it whenever it needs to.
  • Null when parent column null This is a junk-data/model design test. It ensures that when you denormalise a table, its columns are either all populated or (in the event that the foreign key was null) all not populated.

#4

We put all of the custom schema tests we find useful in dbt-utils.

One pattern which is super simple but really useful to implement is the use of the expression_is_true test, so you can define your own logic without having to write the sql for a test. I used this a lot in financial models when I had to validate that subtotal columns were equal to a total, that financial values were positive, etc. etc., so my schema.yml would look like:

version: 2

models:
  - name: payments
    tests:
      - dbt_utils.expression_is_true:
          expression: "inbound_amount = payout_amount + refund_amount"
      - dbt_utils.expression_is_true:
          expression: "inbound_amount = inbound_cash_amount + inbound_credit_amount"