Do you test ephemeral models?


#1

We test all (probably really most) of our views and tables that sit in our analytics schema, which is all the data that is deemed to be ready for use.

We currently don’t have much testing on downstream models that aren’t materialised, particularly ‘base’ ephemeral models that have no joins and effectively just cleans up the individual raw tables. We’ve started doing it, which has massively increased the number of tests of project has (>1000), and I wanted to know what people’s views on this are.

It’s starting to take much longer for the tests to run as well. We currently run tests after each production refresh of the tables, which may not be necessary.

Questions:

  • Do you test all your ephemeral/down-stream models?
  • Do people have a specific setup with all of this?
  • Is there a best practice?

#2

I like testing everything, including ephemerals. It’s a handy development tool. It does take quite some time to run all tests though.


#3

Agreed, we like testing ephemeral models. Thinking behind this:

  1. We create ephemeral models for other downstream models to use.
  2. The downstream models are invariably making a bunch of assumptions about the data coming out of the ephemeral models.
  3. It can be hard or even impossible to reliably test all of the assumptions made by the ephemeral models indirectly in the downstream models. Worst-case scenario is that an assumption is violated, but all of the downstream models and tests function properly and incorrect data gets used in analytics and then incorrect decisions are made based on these incorrect analytics.
  4. Therefore we should test as many assumptions as we can about ephemeral models in an automated fashion.

Yes, it does cause more tests to run which can cause a slowdown. One solution could be tagging or only running subsets of tests at various times.

Also, from a dbt perspective it seems like not such a great idea to being attached to models being ephemeral. The whole point is to easily be able to flip between model types - ephemeral, view, and table as performance or other needs dictate. If we design something with the hardcoded assumption that it will always and only be ephemeral we’re probably building in some technical debt or bugs that will pop up in the future.