I’ve always wondered what the numbers in the stdout log output (e.g.,
...24 of 2200...) actually signified. I believe this is the
node_index (and the
node_count, per a quick search of the codebase), but what does this tell us? (For the sake of argument, let’s assume a head-to-toes rebuild of an entire graph rather than a subset of models using the node selection syntax made available by dbt.)
Naively, drawing on my read of the docs, I assume that this is the order in which dbt proceeds to build the DAG…but is that actually the case? Does the relative movement of models up or down this pecking order give us any indication of how the graph is evolving or how models are “shifting places”? I know there’s concurrency at play, but I know very little (read: nothing) about how the sequencing of nodes is distributed across threads in, say, a
dbt run…or if these node indices were deterministic in any way when varying the number of threads available).
We’ve implemented a couple of changes to our warehouse resourcing that has had an impact on the relative ordering of our set of models per these indices on full runs of our project in production, and I’m curious if this
node_index is something we can hang our hat on in terms of making (even high-level) inferences about our graph.
I know, I know…dbt has canonized their artifacts (which is great!) and there are numerous examples of folks taking advantage of these in clever and inventive ways. I was just unsure if there was something interesting to be said about the wash of numbers flooding my terminal as I stare into the void after issuing a dbt command.
(Note: Apologies, in advance, if this post is in the wrong place – first-time poster unsure of the proper home for this sort of open musing. )