Hi All,
Looking for some approach on how to handle incremental changes in data vault model using dbt or dbt vault? I have created data vault using dbt vault package.
We have created following layers in our data model.
source | schema layer | data vault
Source is sending json messages and we are flattening those json messages and storing in the schema layer and after that we have created data vault model(hubs, sats, and links) from schema layer.
In case of incremental changes, source is sending only updated columns (not full record), in this case how to handle incremental changes in data vault model?.
As you know satellite tables stores history information, every time I should retrieve latest record and apply updates on top o fit.
Is there anyway to handle this easily in dbt? or dbt vault?.
Ex -
source sends ;
initial load -
in the source
custid | name | age| salary|modified date
123 | abc | 24 |50000|2022/8/11 00:00:32
in Schema layer:
custid | name | age| salary|timestamp
123 | abc | 24 |50000|2022/8/11 00:00:32
data vault - satellite table:
custid | name | age| salary
123 | abc | 24 |50000|2022/8/11 00:00:32
Now lets assume source sends incremental changes:
incremental change: [only ID and updated column ‘Salary’]
source:
custid | salary
123 |60000
in Schema layer:
custid | name | age| salary|timestamp | changeflag
123 | abc | 24 |50000|2022/8/11 00:00:32 | I
123 | abc | 24 |60000|2022/9/11 10:20:32 |U [We have to get latest record from satellite tables and update the record with update columns only]
data vault - satellite table:
custid | name | age| salary |changeflag
123 | abc | 24 |50000|2022/8/11 00:00:32 | I
123 | abc | 24 |60000|2022/9/11 10:20:32 |U
Thanks for your suggestions!.
Thanks!
Chandra