The unique data_tests in my yml file is not triggering when I purposefully set the data to have duplicates. How come?
After running my model, the stg_customers table in my snowflake dbt schema is even showing the duplicates. I manually added a duplicate customer_id in the raw data simply to test out the test.
In my schema.yml file, you can see the unique data_tests for the stg_customers model, which is populating the table in the above screen shot from a raw source.
version: 2
models:
- name: stg_customers
description: This model cleans up customer data
columns:
- name: customer_id
description: Primary key
data_tests:
- unique
- not_null
The context of why I’m trying to do this
I’m learning dbt. This is one of the lessons and I wanted to test out the data_tests by purposefully triggering a duplicate and it does not appear to be working.
What I’ve already tried
I confirmed the duplication in the raw data set by simply running an aggregate query…having count(1) > 1 and this customer_id shows as a duplicate.
Here are the stg_customers.sql model and sources.yml code, respectively:
select
id as customer_id,
first_name,
last_name
from {{ source('jaffle_shop', 'customers') }}
version: 2
sources:
- name: jaffle_shop
description: This is a replica of the Postgres database used by our app
database: raw
schema: jaffle_shop
tables:
- name: customers
description: One record per customer.
- name: orders
description: One record per order. Includes cancelled and deleted orders.
