How to Parse dbt-core Run Logs for Execution Statistics (Success/Failure Counts, Error Reasons, Affected Rows, etc.)

I’m exploring ways to parse dbt-core run logs (JSON format) to extract execution statistics programmatically. Specifically, I’d like to:

  1. Count Success/Failure : Identify how many models passed/failed (e.g., from PASS=2 WARN=0 ERROR=0 in logs).
  2. Error Diagnostics : Extract error messages/reasons for failed models (e.g., SQL errors, connection issues).
  3. Impact Analysis : Determine affected rows (e.g., SELECT 4 for a table model) or DDL operations (e.g., CREATE VIEW ).
  4. Timing Metrics : Compile per-model timing (compile/execute phases) and total run duration.

From my testing, I’ve observed:

  • Final stats like PASS/ERROR appear in logs with code: "Z023" or "E047" .
  • Model-specific results include adapter_response (e.g., rows_affected ) and status fields.
  • Errors may appear with level: "error" and stack traces.

Questions:

  1. Are there consistent patterns/log codes to reliably extract these metrics?
  2. How do you handle logs for partial runs (e.g., --fail-fast or interrupted jobs)?
  3. Are there hidden fields (e.g., in run_results.json ) that simplify this analysis?
  4. Any tools/libraries (e.g., Python parsers) you recommend for log processing?

Try using dbt_artifacts package

Note: @prasad0413 originally posted this reply in Slack. It might not have transferred perfectly.

I agree with prasad0413; https://hub.getdbt.com/brooklyn-data/dbt_artifacts/latest/|dbt_artifacts parses the dbt execution artifacts and uploads them into tables in your database. If you want to roll your own version, though, you can take inspiration from how they’ve implemented it. https://github.com/brooklyn-data/dbt_artifacts/tree/2.9.3

Note: @Owen originally posted this reply in Slack. It might not have transferred perfectly.

Thank you very much, but I think this is just a simple run result data, and I don’t want to obtain it in such a complicated way. Currently I’m trying to parse the log content to process it.

I just went to learn about it, and it’s more like a CDC related feature. And I just want to quickly know the result of each execution, just like getting the number of rows affected after SQL is executed. This bulky approach is not for me.