Aggregating Test Results

Hi everyone, I was wondering if anyone knows a great way to keep track of a model’s test failures over time. For example, we know which tests fails on each run from our dbt log output, but we don’t know which tests have failed the most over the last month (or any time period) unless we were to go through the logs one by one and count test failures. Do people write test failures to a table and then query a table? Or how do you solve this issue in your work? Thanks in advance for any help!

1 Like

We wrap our dbt runs in a script that sends failures to Rollbar. We are subsequently alerted to new failures, and can then go back and look at failures over time, see whether they’re repeats, etc. You could also parse the logs and stick them in your database, but I think the right solution depends on what you’re trying to accomplish.

We like being alerted to and managing issues in such a way that it is clear if they’re being worked on, when they’re resolved, etc. We also like having visibility into repeat issues and tests with a high false positive rate so we can update or prune them.

Hey @ddaltonrtr - awesome question! I think @king_fink has the right idea here – there are dedicated monitoring tools (like Rollbar, DataDog, CloudWatch, et al) that make notifications and reporting a breeze if you can get the data into them.

To this end, we’re in the process of adding structured logging to dbt. You can find the (presently in-progress change) here: https://github.com/fishtown-analytics/dbt/pull/1806

Structured logging allows dbt to log verbose information that wouldn’t otherwise be appropriate for a human-being-consumed text stream. With these structured logs, you should be able to see the test that failed, how many failed rows there were, and the model that was being tested. From there, you should be able to scrape up this information and pass it over to the monitoring/alerting system of your choice. This will make it possible to track SLAs over time and understand which parts of the codebase most frequently require attention.

This is a common enough use-case that it will probably also make sense to record these test failures natively to a metadata table using dbt. This issue is long-standing, particularly because we’re wary of running insert statements directly on a datawarehouse, but if you have opinions on the topic we’d sure love to hear them here: https://github.com/fishtown-analytics/dbt/issues/903

There are two topics at play in this issue:

  1. recording the errant rows of a failed test in a table to aid in auditing / resolution
  2. recording the incidence of a failure in a table, so that you can track failures over time in a BI tool, or similar

Each of these topics serves different use-cases, and they’re each valuable in their own right!

Let me know what you think about this either here, or in the linked issue :slight_smile:

Thank you for the feedback @king_fink & @drew!

@king_fink, thanks for sharing how your team utilizes a third party tool to help manage alerts/failures. I can definitely see value in sending logs/messages to Rollbar, DataDog, CloudWatch etc. While it is a little more setup intensive, it would work well for the problem I outlined.

@drew, it’s great to see that structured logging is in dbt’s future to make it even easier to use the solutions @king_fink mentioned.

In terms of recording failures to a metadata table using dbt, I personally value recording the incidence of a failure in a table over recording the errant values themselves. I agree that auditing/resolution is important, but I wonder if for tables that are not “snapshotted”/built as a time series how helpful the error would be in the future when someone looks back on the table. I would prefer to be able to share the types of failures we get with our engineering teams in an easy to digest format, so I can educate them on common issues we’re experiencing in the data world. That being said, I believe there is lots of value in recording the errant rows.

Thank you both for taking the time to respond!