Modeling existing multistep ETL process

ajn47041 · February 6, 2024, 5:06pm

The problem I’m having

I am new to dbt and I am trying to understand how I would best take an existing process, which is just a bunch of SQL usps, and put it into a dbt model format. I understand the core concepts of dbt but I am not understanding exactly how the models could be structured in the following scenario, so I think I am missing something.

High level our current process takes a single source and populates 3 end tables (event, responses, response error).

We are currently running 1 SQL usp per step below and adding the data to intermediate tables along the way to better debug, etc.

The 2 main flows currently are as follows
events
Source → filter data and get newest events → data type and column name changes → merge to historical persist of events

responses
Source → Filter data and get newest events (same as above) → Cleanse data and identify errors → errors identified are added to responses error table → the rest of the data moves on and gets added to historical responses persist

We then take the 2 end tables responses and events and expose them via a single view to the end users.

My initial thought was this for models
-Source - 1 single source
-Filtered - this would allow it to be re-used between the 2 pipelines
-Events - all logic within 1 sql file built up using CTEs to the final end table
-responses - all logic to build up through cleanse and identify errors
-responses persist - logic to get to persist using the responses model
-responses error - logic to get to error using the responses model
-event_response view - this would use the previous models to build the final view

What if I wanted to start loading the final tables incrementally from the source in that I only want to process the newest records each day, would I then have to setup incremental on the filtered table since it’s broken apart like this? Meaning I would essentially need to have a copy of all data in filtered or is there a way to look at the final event table which also contains all the data though its further down the structure.

Topic		Replies	Views
Using DBT on ETL pipeline Help best-practice	0	813	September 12, 2023
best way to implement multi-model pipeline with stage tables? In-Depth Discussions best-practice , snowflake , dbt-cloud	3	263	December 17, 2024
Converting legacy sql build scripts into models Archive	0	2187	August 3, 2021
How Could DBT Models be Customised for Improved Maintenance and Achievement? Help best-practice , snowflake , dbt-cloud , dbt-core	0	216	June 18, 2024
Incremental Best Practices Help incremental , best-practice , dbt-cloud	2	3720	January 15, 2024

Modeling existing multistep ETL process

The problem I’m having

Related topics