Hey, I have a use case when using dbt with Redshift where I want to aggregate data daily (the entire history from various tables) and save it with the current date as new records in my table.
So, on October 24th, my dbt model will calculate the aggregations (various dimensions and metrics), assign the date as October 24th, and save the data to the table. The next day, it will do the same but with October 25th. I want to build a historical table of my statistics this way.
I wanted to use the Incremental model in dbt for this, and everything seems pretty straightforward to implement. However, I have an issue where, from time to time, my process might fail. In such cases, when there is a gap in the data for a given day, I should take the data from the previous day and insert it while changing the date to the missing one. This seems simple enough to do manually, but I need to automate it in dbt – I’m just not quite sure how. The only solution I’ve come up with so far is to create a macro that triggers a procedure, and that procedure would contain the logic to fill in the missing data. It seems impossible to achieve this using only dbt models.
I should mention that I’m using dbt Cloud, and in the end, I’d like to have a process like:
- Daily aggregation into my table
- Tests to check if there are any missing dates, and if so, trigger a process to fill in the missing data.
Do you have any ideas or suggestions on how to approach this? Or there is some way to achieve what I need in simpler way ?