Issue with dbt Unit Tests for BigQuery: Handling of ARRAY<STRING> Columns in Input Fixtures as CSV

The problem I’m having

We have encountered discrepancies between Spark and BigQuery when using dbt unit tests. Specifically, these issues arise when handling input fixture data for columns of type ARRAY. Below is a more detailed explanation with examples to illustrate the problems.

The context of why I’m trying to do this

We are working on a dbt project migration from Spark to BigQuery . To ensure that our models behave consistently across both platforms, we rely on dbt unit tests with mock input data. However, discrepancies in how Spark and BigQuery handle ARRAY<STRING> columns in input fixtures are making it difficult to write and execute these tests effectively.

  • In Spark:
    • If an input column of type ARRAY<STRING> is unused in the test, it can be omitted without causing any issues.
    • When a column is omitted, Spark handles it gracefully by interpreting it as CAST(NULL AS ARRAY<STRING>).
  • In BigQuery:
    • If a column is not mocked, BigQuery attempts to generate a SAFE_CAST(NULL AS ARRAY) statement, which results in a syntax error because BigQuery does not allow casting NULL to an array without specifying the type.

Root Cause

The differences in behavior appear to stem from differences in how Spark and BigQuery handle NULL values and array data during compilation and execution:

  • Spark allows implicit casting of NULL to ARRAY<STRING> without requiring explicit type information.
  • BigQuery requires explicit type information when casting NULL to an array,

What I’ve already tried

I tried to do all the way to mock the data to csv with different ways like below.

1.Tried with “[‘a’,‘b’] or “[””“a”“”, “”“b”“”]"
2.Tried to setup the column type in schema file of dbt with data_type as array
3.Tried safe_cast in model like ( tags as array) as tags
4. Tried with the yml style which works fine.

Some example code or error messages

id,tags
1,"['a', 'b', 'c']"
2,"['x', 'y']"
3,NULL