why do we use upstream and downstream ??
Great question! When you look at the dbt DAG, you can consider a stream of data which flows from source tables (upstream) through staging and intermediate layers until you end with marts at the end of the flow (downstream).
This is well-illustrated when you look at the DAG that dbt docs creates:
In this example,
jafflegaggle_contacts is highlighted in purple. Relative to it, the 5 event and user models are upstream, and
jafflegaggle_corporate_accounts are downstream.
If we instead use
stg_users as our reference point, then only
raw_user is upstream. Now
jafflegaggle_contacts is considered downstream (along with its descendants).
You might also consider BI tools that query your final marts to be “downstream tools” or “downstream consumers”, and data sources such as your CRM to be an “upstream source”.