Availability Metrics & Service Monitoring Approaches for dbtCloud

sgoley · August 2, 2021, 7:40pm

Availability Metrics & Service Monitoring Approaches for dbtCloud

Alternatively considered the title “How to train your dbt”

As a data professional and as a person who “cares a lot” (but authentically, not like the character of the above movie), I’ve been able to see the way that a small number of companies & data teams are interacting with dbt and specifically the dbtCloud platform. I’m writing this post in hopes of starting a conversation that I think needs to be started.

Namely, I believe that one of the biggest implementation / adoption challenges that I’ve seen for new users or new organizations is around quickly understanding when something is not running correctly. Why?

The primary alerting mechanism is a busy email inbox or noisy slack channel
The dbtCloud api is not accessible to unpaid accounts (which many organizations start with)
Push webhooks are currently unsupported here which could tie in other automation services (Zapier, Integromat, Power Platform etc.)
Neither dbt-core nor dbtCloud currently support pushing run results down to the warehouse and writing run “results” to a table.

So, while I can encourage things like the suggested usage of blue/green deployment, it’s generally somewhat difficult to know what’s happening over time (especially with a high frequency of build jobs for incremental view deployments) without better review and visibility.

Getting straight to practical examples, as a data practitioner, I want to be able to justify spend on technical systems by being able to demonstrate that testing practices caught and is reducing the resolution time of errors within our stack.

So something like this hypothetical example of a graph showing that an upstream system had, overnight, introduced new values which were not passing the “acceptable” values tests but were resolved by the team in the first few hours of the day.

Or similarly, a look at when our “full build” which includes snapshoting and external table staging flatlines but the incremental portions continue running successfully.

Starting from those examples, I would like to clarify and re-focus.
This post is not a technical how-to on how to build these analytics.
It’s not a feature request for the dbtCloud platform.
It’s not an integration request to make these metrics available on Grafana, Datadog or some other metrics platforms.

It’s a call to action from one data professional to the community of data professionals to start thinking about how we can align as a community on our standards of reporting on ourselves and providing justification for the expense and the investment of the services that we provide.
It’s about being good stewards of the data we create, not just the data that we curate.

Topic		Replies	Views
Release: dbt v0.14.1 Archive	0	2445	September 4, 2019
Combining dbt Metrics with API, Caching, and Access Control Archive	1	2501	March 16, 2022
How to monitor your dbt tests daily Archive	0	3084	July 13, 2022
dbtCloud remote execution is possible? Help orchestration-and-deployment , dbt-cloud	2	841	July 21, 2023
Preliminary steps towards Data Quality Reporting with dbt Archive	2	2479	May 24, 2022

Availability Metrics & Service Monitoring Approaches for dbtCloud

Availability Metrics & Service Monitoring Approaches for dbtCloud

Related topics