Conditionally running dbt tests / Only running dbt tests in production

I’m interested in filtering out some tests on a CI run. The CI process runs on circleci & all the tests are executed against a test database. The test database is a nightly snapshot of our data warehouse. So,

  • tests depending on data freshness would fail.
  • there are a couple of tests that take a long time to run which would make our CI a lot more painful.

I’d like to run all tests excluding some data tests that fall under these two criteria on each CI bulid

Can something like this be done?

My initial hope was to be able to tag specific data tests with noci & then invoke dbt test somehow excluding the tests with the tag noci. This won’t work because tests can’t be tagged & they use the --model syntax.

Another possible solution I thought of: if we were able to specify the test folder where data tests can be found, I would move all the noci tests to a separate directory and then only run tests not in the noci folder.

What would be the recommended way to accomplish this?

If your tests are purely data tests, you can toggle them using a dbt var or environment variable. We’ve used env vars to enable/disable models more easily, and it works for tests too.

-- tests/t_example.sql
{{
  config(
    enabled=not env_var('DISABLE_RECENCY_CHECK', ''),
  )
}}

SELECT 1

and then compare: DISABLE_RECENCY_CHECK=1 dbt test --model t_example vs. dbt test --model t_example.

Great question! We can actually get dbt to do our bidding here in a slightly different way that doesn’t involve tags.

Prior knowledge for understanding why this works:

  1. If we create a local version of a macro in our own project, it overrides the default macro in dbt.
  2. Tests are implemented as macros – the code for the not_null test is here.
  3. Tests can take arguments
  4. Tests pass when you return a 0
  5. We can figure out which environment we’re in using the jinja object {{ target.name }}, assuming you have named your targets sensibly!

So, putting that all together, if you want to run, say, the not_null test only when in prod, you can add a file to your project in the macros directory: macros/test_not_null.sql that looks like this:

{% macro test_not_null(model) %}

  {% set column_name = kwargs.get('column_name', kwargs.get('arg')) %}

  {% set test_in_env = kwargs.get('env') %}
  
  {#-
  We should run this test when:
  * the environment has not been specified OR,
  * we are in the specified environment
  -#}
  {%- if test_in_env is none or target.name == test_in_env -%}

    select count(*)
    from {{ model }}
    where {{ column_name }} is null

  {#- 
  We should skip this test otherwise, which we do by returning 0
  -#}
  {%- else -%}

    select 0

  {%- endif -%}

{% endmacro %}

Then in our schema.yml files, we can specify environments by using our env key:

models:
  - name: orders
    columns:
        - name: order_id
          tests:
            # this test will run in all environments
            - not_null

        - name: customer_id
          tests:
            # this test will only run when target.name == prod
            - not_null:
                env: prod

This pattern can be extended to take a list of environments if required!

Credit to @michael-kaminsky for this pattern! I based it off a custom test he had written :slight_smile:

2 Likes