Is there a way to show only the most left handed sources/models in the DAG?

The problem I’m having

I have a very large complicated DAG that I am working through. I want to identify the left most sources and/or models.

The context of why I’m trying to do this

I’m doing this to better unravel how the existing data warehouse on which it was based works.

What I’ve already tried

I’ve tried reading the documentation to see if there is an argument I can give the --select section of the DAG or a way to modify the ls command to get what I need.

Some example code or error messages

  No code examples, sorry :(

Have you tried to see the sources/models in the lineage graph of dbt docs? I think is the easiest way to check how your models are connected.

Yeah, that 's part of the problem. Nearly a thousand tables with cycles that we had to break using dot notation. It’s a mess.

hmmm okay, but you can visualize small parts of the project using the tags, select or exclude options

e.g. you want to visualize only the part of the project related to ‘my_model’’

you can put inside select ‘+my_model’, or ‘+my_model+’. If there are yet too many models you can exclude some of them in the same way in exclude.

I did not understand if you already tried it, do you think this can be of some help?

In the docs viewer (or dbt ls), does this work? Writing it on mobile so haven’t checked:

dbt ls --select source:*+1

source:* would be any source, the +1 means the first-order children

1 Like

Thanks for the thought. I have researched those suggestions but without success because the diagram is fragmented. Meaning, there are many orphaned processes where two or more tables connect to each other but not to a source and are floating out on the DAG. The datawarehouse has a lot of technical debt that I’m trying to work through as I try to migrate to the cloud.

Hmmm sounds messy! So you’re saying that not all of the leftmost models are derived from sources, they have hardcoded references to a table?

Could you look at the manifest.json and find nodes which don’t have their depends_on array populated? I think those would be your leftmost nodes.

1 Like

Ah, that did it. Thanks joellabes!

Yes, to answer your question. We had to have hardcoded references to some fact/dim tables to break circular references as we weren’t at the rewrite stage and were trying to build out a diagram to have some semblance of what was going on.

1 Like

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.