A pretty common wish is to add an error_threshold
to an existing test. For example, if less than 10 records fail, this test should still pass, otherwise it should fail. Here’s what we’d want to write in our schema.yml file:
version: 2
models:
- name: dim_customers
columns:
- name: customer_id
tests:
- unique:
error_threshold: 10
And here’s how I would go about solving it!
- View the source code for an existing test (related reading: dbt ships with its own project)
- Copy the code for an existing test into your project! — Now dbt will use your version instead of dbt’s flobal version
- Adjust the test as required.
For the above problem, here’s some code that will do the trick
-- macros/test_unique.sql
{% macro test_unique(model) %}
{% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}
{% set error_threshold = kwargs.get('error_threshold', 0) %}
with validation_errors as (
select
{{ column_name }}
from {{ model }}
where {{ column_name }} is not null
group by {{ column_name }}
having count(*) > 1
),
aggregated as (
select
count(*) as n_errors
from validation_errors
group by 1
)
select
case
when n_errors <= {{ error_threshold }} then 0
else n_errors
end as result
from aggregated
{% endmacro %}