I have a huge curated model that consists of billions of rows. It requires joining 5 other equally huge tables and implementing complex calculations to get a complete picture on the one topic. I’ve set this up in an incremental way (using merge), however rows from years prior can be updated in each of the joined tables so I still need to do a full table scan on most of the tables involved. I’ve created an auxiliary timestamp based on the timestamps of the different tables to use in the incremental logic
It’s reaching a point where full refreshing the model may time out, even when using an XL Warehouse.
I’d love to hear from anyone who has done something similar. What has worked for you? I’m finding it hard to break down the model into sub models because I need to replace the entire row if there was an update in any of the subtables.