I’d like to setup my fact tables as incremental, but am not sure how to make sure they load correctly from multiple sources / refs.
Say I have a fact_a
, which pulls from ref_b
and ref_c
. I’d like fact_a
to refresh in cases where either ref_b
or ref_c
has been updated according to my incremental logic, but can’t figure out how to do this correctly.
An example of what I want:
ref_b table
ref_b_id | ref_b_value | ref_b_updated_at |
---|---|---|
1 | abcdef | ‘2023-05-30’ |
2 | ghijk | ‘2023-05-15’ |
3 | lmno | ‘2023-04-30’ |
ref_c table
ref_c_id | ref_b_id | ref_c_value | ref_c_updated_at |
---|---|---|---|
1 | 1 | opqr | ‘2023-05-30’ |
2 | 2 | stuv | ‘2023-05-30’ |
3 | 4 | wxyz | ‘2023-04-30’ |
Let’s say we join on ref_b_id
, which is the unique key for the fact_a
table. If I’m generating incremental rows for fact_a filtering on updated_at > '2023-05-29'
, I want the new rows to be inserted to be:
fact_a
ref_b_id | ref_b_value | ref_c_id | ref_c_value | updated_at |
---|---|---|---|---|
1 | abcdef | 1 | opqr | 2023-05-30 |
3 | lmno | 2 | stuv | 2023-05-30 |
We’ve added the rows where the source records were recently updated in either ref_b or ref_c.
I feel like this should be pretty straightforward but for some reason I’m having a hard time finding examples of others doing the same. All the examples I’ve seen only show joining on refs where both / all of the refs have been updated within the time window. We have fact tables with quite a few sources so ideally this would work across more than 2 tables