Problems with Array Data Types in DBT Unit Tests with Snowflake

The problem I’m having

We’re trying to write unit tests for a Snowflake-based dbt model that uses ARRAY_CONTAINS() to check for ancillaries. The model works in production, but unit tests fail with type mismatch errors when trying to mock array columns.

Error:

Database Error
001044 (42P13): SQL compilation error: error line 164 at position 13
Invalid argument types for function 'ARRAY_CONTAINS': (VARCHAR(14), ARRAY)

The context of why I’m trying to do this

Model: int_order_details.sql

{{ config(materialized="ephemeral") }}

select
    order_id,
    flight_route,
    added_ancillaries,  -- ARRAY column
    carrier_codes,      -- ARRAY column
    case
        when array_contains('Insurance', added_ancillaries) then 1 else 0
    end as has_insurance,
    case
        when year(created_date) >= 2021
        then coalesce(service_fee_raw, 0)
        else 2.30  -- Legacy fee
    end as service_fee
from {{ ref("base_orders") }}

What I’ve already tried

Approach 1. Inline SQL with array_construct() in YAML

I used array_construct() directly as below, also tried with array_construct(“”)::array, to_array(parse_json(‘[]’)). But got the error above.

unit_tests:
  - name: test_legacy_fee
    model: int_order_details
    given:
      - input: ref('base_orders')
        format: sql
        rows: |
          select 'ORDER_001' as order_id,
                 'JFK-LAX' as flight_route,
                 array_construct('Insurance', 'FastTrack') as added_ancillaries,
                 array_construct('AA') as carrier_codes,
                 '2020-12-31'::date as created_date,
                 5.00 as service_fee_raw
    expect:
      rows:
        - { order_id: "ORDER_001", service_fee: 2.30, has_insurance: 1 }

Approach 2. YAML Row Format with Array Literals

I followed this approach based on recommendation on a similar issue on dbt unit test. Got the same error as above.

given:
  - input: ref('base_orders')
    rows:
      - order_id: 'ORDER_001'
        flight_route: 'JFK-LAX'
        added_ancillaries: ['Insurance', 'FastTrack']
        carrier_codes: ['AA']
        created_date: '2020-12-31'
        service_fee_raw: 5.00

Approach 3: Ephemeral Helper Model

I created a separate model with mock data and called this model in the unit test

ut_base_orders_mock_data.sql (mock data model )

{{ config(materialized="ephemeral", tags=["unit_test_helper"]) }}

with mock_data as (
    select
        'ORDER_001' as order_id,
        'JFK-LAX' as flight_route,
        array_construct('Insurance', 'FastTrack') as added_ancillaries,
        array_construct('AA') as carrier_codes,
        '2020-12-31'::date as created_date,
        5.00 as service_fee_raw
)
select * from mock_data

unit test file

model: int_order_details 
given:
  - input: ref('base_orders')
    format: sql
    rows: |
      select * from ut_base_orders_mock_data
      where order_id = 'ORDER_001'

Current Hypothesis

In all approaches, when the unit test framework materializes the mock data, array_construct() or YAML array literals (['item']) appear to be treated as VARCHAR instead of ARRAY type when passed to the model’s SQL logic.

The ARRAY_CONTAINS() function in the model then receives:

  • First argument: String literal 'Insurance'

  • Second argument: VARCHAR (should be ARRAY)

Questions

  1. Is there a way to explicitly cast array columns in unit test mock data?

  2. Does dbt’s unit test framework support complex Snowflake types like ARRAY/OBJECT?

  3. Should we be using a different testing pattern for models with array operations?

Will really appreciate your help. Thanks so much.

Hi @AlexiaWu! I replicated your error with this simple example (see screenshot). The problem might reflect the fact that ARRAY_CONTAINS expects a VARIANT column, not varchar. So just add a cast as seen on the 2nd screenshot.

Replicated error:

In the ephemeral model, action needed:
case
when array_contains(‘Insurance’::VARIANT, added_ancillaries) then 1 else 0
end as has_insurance

2 Likes