Hi,
I have to combine a historical dataset which is static copy (no data refresh on the table) of some historical data with new data source (same schema) which will be live data and will keep on refreshing every hour.
Any ideas on how to proceed?
Hi,
I have to combine a historical dataset which is static copy (no data refresh on the table) of some historical data with new data source (same schema) which will be live data and will keep on refreshing every hour.
Any ideas on how to proceed?
I would start simple, and just union the two datasets together and materialize that union model as a view. For downstream models, you can either materialize them as views or rebuild the downstream tables every hour, so they contain fresh data. This will probably scale better than you think, and may or may not be wasteful, resource-wise, depending on your warehouse.
A second option would be to pursue a strategy called Lambda Views. There’s an excellent discussion of that in this post: How to create near real-time models with just dbt + SQL
Thank you Ted.
This will work if both historic and new data source both have downstream models but in my case if possible, I would rather not keep historic data model running and just take a snapshot and then union with new data source. But when I try to implement this it gives error in dbt that node for historic data does not exists, which means I have to keep historic downstream model running along with new data source.
I haven’t tried Lambda view but I am guessing it will also require both models running downstream.