Testing with fixed data set

Hi,

One thing that I’ve feel missing is the ability to control different aspects of a model during different tests. Ideally I would conditionally like to change data source(s) based on which test is running to add data sources that can test both positive and negative cases; as such global settings such as env=ci doesn’t cut it (unless you want to sacrifice readability dumping all the test cases in 1 huge table and then filtering).

I did a test with implementing a cte helper that returns the compiled code of a model. This allows one to switch out anything (not be most beautiful with string-replacements but works well enough, especially for high entropy strings from ref() and source() )

Contrived example
models/addder.sql

    SELECT 
      *,
      a+b AS sum
    FROM {{ ref('some_other_model') }}

/tests/adder_001.sql

{% set model_to_test = cte('model.addder')
      | replace(ref('some_other_model'), "mocked_src1") 
%}


WITH
mocked_src1 AS (
  SELECT
    *
  FROM
  UNNEST(
   [
     struct(1 as a,  2 as b,  3 as expectedSum),
     struct(2 as a,  2 as b,  4 as expectedSum),
     struct(0 as a,  0 as b,  0 as expectedSum),
     struct(1 as a, -2 as b, -1 as expectedSum),
   ])
),

final AS (
  {{ model_to_test }}
)

-- Some SQL assertion based on the specific mocked data.
SELECT * FROM final WHERE sum != expectedSum

The advantages being:

  1. Pure dbt(Jinja/SQL);
  2. “full control” per test
  3. No changes/branching/flags required in production models.
  4. Supports both virtual tables as per above and references to larger data sets / seeds.
  5. no yaml.
  6. potential to be plugin free due to low code impact

Downsides:

  1. Using string replacements can be fragile. Works well for ref/source replacements since these are high entropy strings with low change of accidental replacements.
  2. Requires a new macro (cte) in the DBT core but this was quite minimal.
  3. The 10 line POC worked well in simple tests but might introduce unknown behavior that i’m unaware of.

I’m also considering if the cte helper should accept another argument which can override context-variables such as val(), maybe even refs but haven’t been able to find a clean solution to do it yet.

Would love to hear if some community feedback on this approach. I think think could solve a lot of unit testing cases with a simple approach but i might also been drinking too much cool aid :man_shrugging:

1 Like