How to feed DWH models from multiple async sources

Hi all,
I’d like to use dbt to build a DWH with multiple source systems (one for each company of the group) which I need to load at different schedules during the day.
The idea is to land to a common DWH data model, so that e.g. my INVOICE table will contain data coming from all the different source systems (each one with its own data structure, and thus with its own transformations).

I know I cannot have in dbt more than one model pointing to the same physical table, so I thought I could use 2 different approaches:

  1. create one dbt project for each source system (which is not my prefered choice)
  2. create an intermediate model for each source system, having the same data structure as the target, and then load the final DWH table with something like
    select * from {{ ref( var('invoice') ) }}
    giving a different value to var(‘invoice’) for each run depending on the source system I’m loading.

Did some of you had the same problem in the past?
How did you adress it?
Do you see any draw back in approach 2?
The bad thing I see is that documentation won’t be complete, i.e. the DAG will point to the value of var(‘invoice’) I used when I generated the documentation.

Thanks
Daniele

Hi @daniele.frigo, did you have a look at the standard dwh model (staging/warehouse/marts) ?
I don’t understand why you would have to change your source at each dbt run ?
Why not union all your source invoice tables in a final invoce model ?

A proven approach to safely combine many sources is data vault.

Hoping it helps,
Best regards,

@fabrice.etanchaud I perfectly know the way a standard DWH works.
As I tried to explain, I need to feed a common data model from different source system (one for each company), but since they in different places of the world, I need to schedule each of them at different times of the day.
I’d like to avoid reloading all the companies data any time I need to refresh one of them.
I could feed some kind of run id and filter on it when I do a union, but on some databases this could be anyway not the most efficient loading strategy.