Availability Metrics & Service Monitoring Approaches for dbtCloud

Availability Metrics & Service Monitoring Approaches for dbtCloud

Alternatively considered the title “How to train your dbt”

As a data professional and as a person who “cares a lot” (but authentically, not like the character of the above movie), I’ve been able to see the way that a small number of companies & data teams are interacting with dbt and specifically the dbtCloud platform. I’m writing this post in hopes of starting a conversation that I think needs to be started.

Namely, I believe that one of the biggest implementation / adoption challenges that I’ve seen for new users or new organizations is around quickly understanding when something is not running correctly. Why?

  • The primary alerting mechanism is a busy email inbox or noisy slack channel
  • The dbtCloud api is not accessible to unpaid accounts (which many organizations start with)
  • Push webhooks are currently unsupported here which could tie in other automation services (Zapier, Integromat, Power Platform etc.)
  • Neither dbt-core nor dbtCloud currently support pushing run results down to the warehouse and writing run “results” to a table.

So, while I can encourage things like the suggested usage of blue/green deployment, it’s generally somewhat difficult to know what’s happening over time (especially with a high frequency of build jobs for incremental view deployments) without better review and visibility.

Getting straight to practical examples, as a data practitioner, I want to be able to justify spend on technical systems by being able to demonstrate that testing practices caught and is reducing the resolution time of errors within our stack.

So something like this hypothetical example of a graph showing that an upstream system had, overnight, introduced new values which were not passing the “acceptable” values tests but were resolved by the team in the first few hours of the day.

Or similarly, a look at when our “full build” which includes snapshoting and external table staging flatlines but the incremental portions continue running successfully.

Starting from those examples, I would like to clarify and re-focus.
This post is not a technical how-to on how to build these analytics.
It’s not a feature request for the dbtCloud platform.
It’s not an integration request to make these metrics available on Grafana, Datadog or some other metrics platforms.

It’s a call to action from one data professional to the community of data professionals to start thinking about how we can align as a community on our standards of reporting on ourselves and providing justification for the expense and the investment of the services that we provide.
It’s about being good stewards of the data we create, not just the data that we curate.

1 Like