I have a very large complicated DAG that I am working through. I want to identify the left most sources and/or models.
The context of why I’m trying to do this
I’m doing this to better unravel how the existing data warehouse on which it was based works.
What I’ve already tried
I’ve tried reading the documentation to see if there is an argument I can give the --select section of the DAG or a way to modify the ls command to get what I need.
Thanks for the thought. I have researched those suggestions but without success because the diagram is fragmented. Meaning, there are many orphaned processes where two or more tables connect to each other but not to a source and are floating out on the DAG. The datawarehouse has a lot of technical debt that I’m trying to work through as I try to migrate to the cloud.
Yes, to answer your question. We had to have hardcoded references to some fact/dim tables to break circular references as we weren’t at the rewrite stage and were trying to build out a diagram to have some semblance of what was going on.