Creating an error threshold for schema tests

A pretty common wish is to add an error_threshold to an existing test. For example, if less than 10 records fail, this test should still pass, otherwise it should fail. Here’s what we’d want to write in our schema.yml file:

version: 2

models:
  - name: dim_customers
    columns:
      - name: customer_id
        tests:
          - unique:
              error_threshold: 10

And here’s how I would go about solving it!

  1. View the source code for an existing test (related reading: dbt ships with its own project)
  2. Copy the code for an existing test into your project! — Now dbt will use your version instead of dbt’s flobal version
  3. Adjust the test as required.

For the above problem, here’s some code that will do the trick :wink:

-- macros/test_unique.sql
{% macro test_unique(model) %}
{% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}
{% set error_threshold = kwargs.get('error_threshold', 0) %}
with validation_errors as (
    select
        {{ column_name }}
    from {{ model }}
    where {{ column_name }} is not null
    group by {{ column_name }}
    having count(*) > 1
),
aggregated as (
    select
        count(*) as n_errors
    from validation_errors
    group by 1
)
select
    case
      when n_errors <= {{ error_threshold }} then 0
      else n_errors
    end as result
from aggregated
{% endmacro %}
7 Likes

Thank you for the useful info! :slight_smile:

This is something we’ve done repeatedly, and I’ve started wondering if there is a good way of reusing the logic in the existing test instead of copy-pasting the code?

In the particular case you’re describing, it would be very handy to be able to call the test in our own code and only apply custom logic to the returned result. We don’t want to modify the test logic itself, after all.