why do we use upstream and downstream ??
Great question! When you look at the dbt DAG, you can consider a stream of data which flows from source tables (upstream) through staging and intermediate layers until you end with marts at the end of the flow (downstream).
This is well-illustrated when you look at the DAG that dbt docs creates:
In this example, jafflegaggle_contacts
is highlighted in purple. Relative to it, the 5 event and user models are upstream, and jafflegaggle_facts
and jafflegaggle_corporate_accounts
are downstream.
If we instead use stg_users
as our reference point, then only raw_user
is upstream. Now jafflegaggle_contacts
is considered downstream (along with its descendants).
You might also consider BI tools that query your final marts to be “downstream tools” or “downstream consumers”, and data sources such as your CRM to be an “upstream source”.