State of testing in dbt

Hello!

My team and I have been thinking about testing lately and throughout this thinking I did a deep-dive into the current state of testing in dbt. In this post, I will try to motivate the importance of testing, present research I’ve collected about testing in dbt, make some conclusions from that research, and finally make some suggestions for advancement of testing in dbt.

I tagged a lot of folks on this post, would love to get more input, more research, and opinions here. Thanks!

Motivation

Presently, dbt offers two types of tests, schema tests and data tests. For the purpose of this post, I will refer to these types of tests as dbt’s primitive tests in the sense they are implemented in the dbt core and can be used in more sophisticated ways. Schema tests are good for enforcing constraints that RDBMS systems typically offer but analytical databases don’t. Meanwhile, data tests can test anything with endless possibilities.

Zooming out a bit from dbt, I would like to motivate the value of testing in software. The points here are applicable to all software projects including dbt projects.

Why should I test software?

With software tests, an engineer can:

  • reduce the likelihood of an unexpected outcome or side-effect from code or system under test,
  • reduce the likelihood that a current or future code change will introduce a system error,
  • inspire confidence in the test author and others that the code or system under test works as-intended, and
  • document code and system behavior by illustrating edge-cases.

There are two parts to testing: writing tests and running tests. Both of these parts are necessary for a useful test suite to provide the above benefits.

How should I test software?

A good test typically verifies something: that a piece of code behaves a certain way under certain conditions, or that a system functions in a certain way.

There are many established ways to test software, each with its own strengths and suitability. I would argue that the best way to test highly depends on what is trying to be accomplished, e.g., prevent code regressions, ensure a contract, introduce a new working feature. The level of effort for testing scales depending on the situation.

I would like to highlight two testing categories that I think are relevant to dbt:

  • Smoke testing: “Where there’s smoke, there’s fire.” A smoke test confirms that the system under test looks and/or behaves a certain way. Smoke tests are used in blue/green deployment pipelines as a last-step check before cutover; if the smoke test fails, then the deployment pipeline fails. Smoke tests are difficult to run at development-time because it requires the entire production system to be functioning for the developer. Smoke tests are particularly effective in continuous deployment pipelines to help catch issues difficult to detect at development-time. Smoke tests are a sign of a mature system.
  • Unit testing: A unit test proves that a “unit” of code works as expected. A unit test commonly has three steps: setup, execute the code under test, and verification/teardown section. A unit can vary in size, from a single function, to an entire class. Unit tests are self-contained and typically do not test functionality of external systems (e.g., using HTTP calls or database connections). “Mocking” is a common technique used during unit tests and is often essential for unit testing. Unit tests are run at development-time and during continuous integration to detect breakages before code reaches production.

Testing can occur at a variety of points in the software development lifecycle. Unit tests are typically run at development-time (i.e., when the code is being developed) and during continuous integration (i.e., when the code is being merged into the main branch of the project). Smoke tests are typically run during continuous deployment (i.e., when the code/data is being deployed to production).

(For more on testing in software, I recommend to consult Martin Fowler’s writing!)

Community research

I have conducted research about dbt by reading through blog posts, GitHub issues, and dbt Discourse. I’ve found there are a number of community members who described their attempts at and/or the challenges of building advanced tests to suit their needs:

  • Petyo Pahunchev pointed out that primitive tests (schema tests and data tests) leave a gap in dbt development, saying “there is no tool to let us deterministically re-create a set of conditions, execute our code and validate the output.” In his post, he suggested a way to create tests using a Python library to allow for a behavioural-driven development methodology in dbt.
  • @michael-kaminsky outlined the challenges of adopting a test driven development approach in ELT. He made a number of good points, including “it takes a lot of work to generate realistic test-cases”, and “executing a pipeline and testing it can take upwards of 10 minutes — this is way too slow for doing real test-driven development”.
  • MichelleArk described their team’s attempt of hand-constructing static CSVs, building macros, and running primitive tests to assert model behavior generally and in edge cases. Ultimately, they wrap up the GitHub issue saying “a general issue with this approach is that writing assertions for unit tests feels quite unnatural in SQL - it’s tricky even to get the right semantics for an equality check.” Later in the post, I described my approach to black box testing pl/pgsql transformations using Python.
  • @fabrice.etanchaud noted that with the new --defer flag, there may be a way to automate non-regression tests (see diagram). Later in the post, @jtcohen described how to mock a dbt model using macros and run-time variables.

Despite these challenges, there are successful examples of building advanced tests using dbt’s primitive tests or by building scaffolding around dbt:

  • @claire’s described how to test with dbt using a fixed dataset. Her fixed dataset consists of “fake data consisting of all my known edge cases so that I can check that my SQL works.” She noted she only does this testing during development, and it’s not clear if engineers returning to her code are able to run these tests.
  • @claus presented how to run primitive dbt tests as a smoke test during a blue/green data deployment. If the smoke tests pass, then the dbt output is made available for use; otherwise the data deployment fails.
  • @gnilrets built a well-documented testing framework outside of dbt. A developer using the framework may write testing specifications for a data transformation, and the framework verifies the spec.
  • The dbt-utils project has a series of integration tests that verify project functionality across the supported database. Meanwhile dbt-audit-helper has macros that can supercharge dbt data tests.

Conclusions

Based on this research, I conclude the following:

  1. Strategies for smoke testing in dbt have been discovered and documented by the community.
  2. Meanwhile, there is a high barrier-to-entry to advanced development-time dbt testing strategies. To achieve advanced dbt testing, one either needs expert-level knowledge of dbt, or a high investment of individual time.
  3. There’s a general desire for advancing development-time testing capabilities inside or outside of dbt and community members are looking for capabilities known in the field of software testing.

Suggestions

Here are some concrete suggestions that I think will advance testing in dbt:

  • Product: Map out the current development-time testing workflow and map out an improved development-time testing workflow.
  • Technical: Make “mocking” a dbt model and dbt source a well-designed and first-class construct in dbt. I think many of the community members I referenced above could benefit from a well-defined approach to mocking in dbt.
  • Documentation: Motivate testing in the dbt docs and present the trade-offs of testing. Promote testing best-practices and advanced testing strategies in the dbt docs. (What does a well-tested model look like? When should tests be run? See React Testing for an example.)

Looking forward to continuing the discussion!

14 Likes

Hi!

This is something I think about as well, as a Software Engineer coming into data and using dbt. I love dbt and how it works, it’s such a great tool.

One of the things I have also thought about is testing, and how we ensure data is correct when building our pipeline:

  • We use schema tests in our models to ensure not_null etc, and using dbt-utils package for helpers.
  • We have a csv which has specific columns, setup like this:
description,table_name,where_field,where_value,test_field,test_value
A customer has this weird edge case for a field,fct_foobar,customer_id,cd8e741c-1436-4145-bf00-b3e59fbf73f7,my_awesome_field,foopytoopy

So this then has a python script which looks for this file under data/data_tests.csv which will build out tests in tests/generated/xxx.sql, which will have the following content:

{{ config(tags=['tests']) }}
{{test_where_exists('fct_foobar','customer_id','cd8e741c-1436-4145-bf00-b3e59fbf73f7','my_awesome_field','foopytoopy')}}

The macro looks like this:

{% macro test_where_exists(model, where_field, where_value, test_field, test_value) %}
select {{where_field}}, {{test_field}} from {{ ref(model) }} where {{where_field}} = '{{where_value}}' and {{test_field}} != '{{ test_value }}'
{% endmacro %}

This is great as we can build this against random edge cases and “known good” orders. If something breaks we’ll know straight away.

We’re also planning on using a tool like great expectations to test our data models.

2 Likes

@joshtaylor thanks for sharing! This is quite clever, at which point in the development lifecycle do you run this test? At development-time, or at deploy-time?

I like the name test_where_exists, I wonder if semantic macros in a dbt package could be added to make improve the baseline of development-time testing. An idea I have for this is exact_match like so:

-- BigQuery Standard SQL:

{% macro exact_match(test_dataset_query, query_under_test) %}
(
    ({{ test_dataset_query }})
    except distinct
    ({{ query_under_test }})
)
union all 
(
    ({{ query_under_test }})
    except distinct
    ({{ test_dataset_query }})
)
{% endmacro %}

Then two more macros that can be used similarly:

  • is_contained_within(test_dataset_query, query_under_test)
  • does_not_contain(test_dataset_query, query_under_test)

Applying this in a data test (borrowing from jaffle_shop):

{% set test_dataset_query %}
select 'mperez0@chronoengine.com' as email, 33 as customer_lifetime_value union all
select 'smccoy1@reddit.com' as email, 23 as customer_lifetime_value union all
select 'kpayne2@cargocollective.com' as email, 65 as customer_lifetime_value
{% endset %}

{% set query_under_test %}
select
    email,
    customer_lifetime_value
from {{ ref('dim_customers') }} as dim_customers
left join {{ ref('raw_customers') }} as raw_customers
    on raw_customers.id = dim_customers.customer_id
{% endset %}

{{ exact_match(test_dataset_query, query_under_test) }}

Now imagine being able to define which refs are mocked and how in the above data test file, I think that would be a great developer experience for a self-contained “unit test” in a single file. @jerco curious to get your thoughts on the feasibility of that idea.

1 Like

We run these tests in development using filtering, but we also have a “sample” target set in the dbt profiles.yml.

We then have a filter in our models to filter by this id, as we want to filter orders that match usually :slight_smile:

So inside of our staging models, we do:

{{filter_if_sample_target_by('order_id')}}

we then have a python script which builds this macro:

file = open(f"./macros/join_orders.sql", "w")
macro = "{% macro filter_if_sample_target_by(source_column_name) %} {% if target.name == 'sample' %} where {{source_column_name}} in ("
macro += "'"+"','".join(ids)+"'"
macro += ") {% endif %} {% endmacro %}"
file.write(macro)
file.close()

Now our dataset in development goes down to ~150 records and is a lot faster to build against. :slight_smile:

1 Like