I am new to dbt and I am trying to understand how I would best take an existing process, which is just a bunch of SQL usps, and put it into a dbt model format. I understand the core concepts of dbt but I am not understanding exactly how the models could be structured in the following scenario, so I think I am missing something.
High level our current process takes a single source and populates 3 end tables (event, responses, response error).
We are currently running 1 SQL usp per step below and adding the data to intermediate tables along the way to better debug, etc.
The 2 main flows currently are as follows
Source → filter data and get newest events → data type and column name changes → merge to historical persist of events
Source → Filter data and get newest events (same as above) → Cleanse data and identify errors → errors identified are added to responses error table → the rest of the data moves on and gets added to historical responses persist
We then take the 2 end tables responses and events and expose them via a single view to the end users.
My initial thought was this for models
-Source - 1 single source
-Filtered - this would allow it to be re-used between the 2 pipelines
-Events - all logic within 1 sql file built up using CTEs to the final end table
-responses - all logic to build up through cleanse and identify errors
-responses persist - logic to get to persist using the responses model
-responses error - logic to get to error using the responses model
-event_response view - this would use the previous models to build the final view
What if I wanted to start loading the final tables incrementally from the source in that I only want to process the newest records each day, would I then have to setup incremental on the filtered table since it’s broken apart like this? Meaning I would essentially need to have a copy of all data in filtered or is there a way to look at the final event table which also contains all the data though its further down the structure.