Conditionally running dbt tests / Only running dbt tests in production

hali · April 29, 2019, 8:44pm

I’m interested in filtering out some tests on a CI run. The CI process runs on circleci & all the tests are executed against a test database. The test database is a nightly snapshot of our data warehouse. So,

tests depending on data freshness would fail.
there are a couple of tests that take a long time to run which would make our CI a lot more painful.

I’d like to run all tests excluding some data tests that fall under these two criteria on each CI bulid

Can something like this be done?

My initial hope was to be able to tag specific data tests with noci & then invoke dbt test somehow excluding the tests with the tag noci. This won’t work because tests can’t be tagged & they use the --model syntax.

Another possible solution I thought of: if we were able to specify the test folder where data tests can be found, I would move all the noci tests to a separate directory and then only run tests not in the noci folder.

What would be the recommended way to accomplish this?

bkleszyk · April 29, 2019, 9:09pm

If your tests are purely data tests, you can toggle them using a dbt var or environment variable. We’ve used env vars to enable/disable models more easily, and it works for tests too.

-- tests/t_example.sql
{{
  config(
    enabled=not env_var('DISABLE_RECENCY_CHECK', ''),
  )
}}

SELECT 1

and then compare: DISABLE_RECENCY_CHECK=1 dbt test --model t_example vs. dbt test --model t_example.

claire · April 29, 2019, 9:42pm

Great question! We can actually get dbt to do our bidding here in a slightly different way that doesn’t involve tags.

Prior knowledge for understanding why this works:

If we create a local version of a macro in our own project, it overrides the default macro in dbt.
Tests are implemented as macros – the code for the not_null test is here.
Tests can take arguments
Tests pass when you return a 0
We can figure out which environment we’re in using the jinja object {{ target.name }}, assuming you have named your targets sensibly!

So, putting that all together, if you want to run, say, the not_null test only when in prod, you can add a file to your project in the macros directory: macros/test_not_null.sql that looks like this:

{% macro test_not_null(model) %}

  {% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}

  {% set test_in_env = kwargs.get('env') %}
  
  {#-
  We should run this test when:
  * the environment has not been specified OR,
  * we are in the specified environment
  -#}
  {%- if test_in_env is none or target.name == test_in_env -%}

    select count(*)
    from {{ model }}
    where {{ column_name }} is null

  {#- 
  We should skip this test otherwise, which we do by returning 0
  -#}
  {%- else -%}

    select 0

  {%- endif -%}

{% endmacro %}

Then in our schema.yml files, we can specify environments by using our env key:

models:
  - name: orders
    columns:
        - name: order_id
          tests:
            # this test will run in all environments
            - not_null

        - name: customer_id
          tests:
            # this test will only run when target.name == prod
            - not_null:
                env: prod

This pattern can be extended to take a list of environments if required!

Credit to @michael-kaminsky for this pattern! I based it off a custom test he had written

naveedn · June 3, 2020, 7:13pm

Hi @claire – is this still the recommended way to have a CI / CD interface? Wouldn’t it be easier to have a profile target that is “test” and have the CI tool run the dbt test command against that target?

I guess I’m having a hard time understanding where the above method is advantageous.

claire · June 11, 2020, 12:25pm

Hi Naveed — this approach is advantageous when you use a slightly different dataset in development versus production. This sometimes happens when users decide to limit their datasets when running in dev (e.g. only transforming the last few days of data) to speed up their dev runs. But, as a result, some of their tests may start to fail.

So this is a workaround — how can I choose to only run some tests in prod, since I know they’re going to fail in dev?

adamstone · October 18, 2021, 11:34am

Hi! For anyone who comes across this post, dbt v0.20 changed how schema (now called generic) tests work.

The biggest breaking change is that “all tests now return a set of failing rows, rather than a single numeric value.”

That means that just select 0 won’t work anymore. The final dataset produced by the test SQL will contain 1 row containing a value of 0. Because the dataset has at least 1 row, it will trip a test failure.

However we can leverage the new fail_calc config parameter! The default calculation is count(*) which will return a value of 1 if you used select 0, hence the failing test.

We can overwrite that with sum(validation_errors) which will return a value of 0 and the test will pass. Here’s how to do it using the SQL from @claire’s example above but updated for v20.0:

{% test not_null(model) %}

  {% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}

  {% set test_in_env = kwargs.get('env') %}
  
  {#-
  We should run this test when:
  * the environment has not been specified OR,
  * we are in the specified environment
  -#}
  {%- if test_in_env is none or target.name == test_in_env -%}

    select *
    from {{ model }}
    where {{ column_name }} is null

  {#- 
  We should skip this test otherwise, which we do by returning 0
  -#}
  {%- else -%}

    {{ config(fail_calc = "sum(validation_errors)") }}

    select 0 as validation_errors

  {%- endif -%}

{% endtest %}

claus · October 18, 2021, 1:28pm

@adamstone you might be able to simplify by returning 0 rows when you’re in dev, e.g.

select *
from {{ model }}
where 
    {{ column_name }} is null
    and
    {%- if test_in_env is none or target.name == test_in_env -%}
    1=1
    {%- else -%}
    1=0
    {%- endif -%}

Would that work?

adamstone · October 19, 2021, 1:02pm

Hey @claus, that’s also a good approach, but @joellabes has suggested an even easier way - just conditionally disable the test by passing in {{ config(enabled = false) }}!

This way it doesn’t even show up in the output when running dbt test, which is better! Using any of the approaches discussed above, it still looks like the conditional test runs and passes, even though it’s really not.

Topic		Replies	Views
What are the dbt commands you run in your production deployment of dbt? Archive	12	30188	March 30, 2021
Testing with fixed data set In-Depth Discussions testing	16	19414	August 5, 2022
Advanced schema tests; thresholds, exclusions and date limits Show and Tell testing	1	6834	July 19, 2021
Enviornment Variable Help	0	542	March 18, 2024
Dbt model (think unit) tests POC Show and Tell testing	2	5756	August 24, 2021

Conditionally running dbt tests / Only running dbt tests in production

Related topics