I’m exploring ways to parse dbt-core
run logs (JSON format) to extract execution statistics programmatically. Specifically, I’d like to:
- Count Success/Failure : Identify how many models passed/failed (e.g., from
PASS=2 WARN=0 ERROR=0
in logs). - Error Diagnostics : Extract error messages/reasons for failed models (e.g., SQL errors, connection issues).
- Impact Analysis : Determine affected rows (e.g.,
SELECT 4
for a table model) or DDL operations (e.g.,CREATE VIEW
). - Timing Metrics : Compile per-model timing (compile/execute phases) and total run duration.
From my testing, I’ve observed:
- Final stats like
PASS/ERROR
appear in logs withcode: "Z023"
or"E047"
. - Model-specific results include
adapter_response
(e.g.,rows_affected
) andstatus
fields. - Errors may appear with
level: "error"
and stack traces.
Questions:
- Are there consistent patterns/log codes to reliably extract these metrics?
- How do you handle logs for partial runs (e.g.,
--fail-fast
or interrupted jobs)? - Are there hidden fields (e.g., in
run_results.json
) that simplify this analysis? - Any tools/libraries (e.g., Python parsers) you recommend for log processing?