Hello dbt community,
I m facing an interesting challenge and looks to receive advices/stories on how to approach this challenge.
In Snowflake which is the main database Im having multiple source tables each of them being updated at random times.
For specifics lets say Im running a grocery store and have following 2 tables: “customers” (each row represents single person) and “orders” (each row represents single checkout).
Table “orders” is connected to table “customers” through the key called “customerId” (which is mapping to “id” field on “customers” table).
I want to create a new table called “reportcustomers” which not only uses some fields from “customers” table, but also aggregates some of the data from “orders” table. An example for this aggregation would be: having a field with total number of orders per person.
I want to have 2 dbt models which are writing to the same table and having different materialization strategies:
1st model with “incremental” configuration which for example monitors timestamp on orders and in case new orders are being created it should update related “reportcustomers” to update count of total orders per customer.
2nd model will use “table” configuration and purpose of this model is to run initial sync and in case there are any issues with event-driven model (1st one) it should be able to rebuild table: using same example with total orders, it ll just go through the whole “orders” table, count total orders for customers and save it in “reportcustomers”.
In dbt each model creates a new table, so Im wondering what do you think would be the best strategy in order to achieve given specs and what would be the cleanest architecture for such system?
Best regards and have an awesome day!