I’m exploring ways to parse dbt-core run logs (JSON format) to extract execution statistics programmatically. Specifically, I’d like to:
- Count Success/Failure : Identify how many models passed/failed (e.g., from
PASS=2 WARN=0 ERROR=0in logs). - Error Diagnostics : Extract error messages/reasons for failed models (e.g., SQL errors, connection issues).
- Impact Analysis : Determine affected rows (e.g.,
SELECT 4for a table model) or DDL operations (e.g.,CREATE VIEW). - Timing Metrics : Compile per-model timing (compile/execute phases) and total run duration.
From my testing, I’ve observed:
- Final stats like
PASS/ERRORappear in logs withcode: "Z023"or"E047". - Model-specific results include
adapter_response(e.g.,rows_affected) andstatusfields. - Errors may appear with
level: "error"and stack traces.
Questions:
- Are there consistent patterns/log codes to reliably extract these metrics?
- How do you handle logs for partial runs (e.g.,
--fail-fastor interrupted jobs)? - Are there hidden fields (e.g., in
run_results.json) that simplify this analysis? - Any tools/libraries (e.g., Python parsers) you recommend for log processing?