Incremental from multiple refs / sources

I’d like to setup my fact tables as incremental, but am not sure how to make sure they load correctly from multiple sources / refs.

Say I have a fact_a , which pulls from ref_b and ref_c. I’d like fact_a to refresh in cases where either ref_b or ref_c has been updated according to my incremental logic, but can’t figure out how to do this correctly.

An example of what I want:

ref_b table

ref_b_id ref_b_value ref_b_updated_at
1 abcdef ‘2023-05-30’
2 ghijk ‘2023-05-15’
3 lmno ‘2023-04-30’

ref_c table

ref_c_id ref_b_id ref_c_value ref_c_updated_at
1 1 opqr ‘2023-05-30’
2 2 stuv ‘2023-05-30’
3 4 wxyz ‘2023-04-30’

Let’s say we join on ref_b_id, which is the unique key for the fact_a table. If I’m generating incremental rows for fact_a filtering on updated_at > '2023-05-29', I want the new rows to be inserted to be:

fact_a

ref_b_id ref_b_value ref_c_id ref_c_value updated_at
1 abcdef 1 opqr 2023-05-30
3 lmno 2 stuv 2023-05-30

We’ve added the rows where the source records were recently updated in either ref_b or ref_c.

I feel like this should be pretty straightforward but for some reason I’m having a hard time finding examples of others doing the same. All the examples I’ve seen only show joining on refs where both / all of the refs have been updated within the time window. We have fact tables with quite a few sources so ideally this would work across more than 2 tables

edit: I think I figured out how to do it using UNION. Not sure how fast this will be but seems logically correct to me, using actual tables from my org:

with users_count as (
    SELECT
        cu.owned_by_organization_id,
        count(*) as user_count,
        max(cu.date_updated) as user_date_updated
    FROM
        user_user cu
    group by cu.owned_by_organization_id
),

sites_count as (
    SELECT
        si.owned_by_organization_id,
        count(*) as site_count,
        max(si.date_updated) as site_date_updated
    FROM
        site_site si
    group by si.owned_by_organization_id
),

user_org_ids as (
    SELECT owned_by_organization_id as org_id
    from users_count
    where users_count.user_date_updated > (select max(date_updated) from {{ this }})
),


sites_org_ids as (
    SELECT
        owned_by_organization_id as org_id
    FROM
        sites_count
    WHERE
        sites_count.site_date_updated > (select max(date_updated) from {{ this }})
),

org_final as (
    select
        org.*,
        users_count.*,
        sites_count.*
    from user_organization org
    left join users_count on users_count.owned_by_organization_id = org.id
    left join sites_count on sites_count.owned_by_organization_id = org.id
    {% if is_incremental() %}
    where org.id in (
        select org_id from user_org_ids
        union
        select org_id from sites_org_ids
    )
    {% endif %}
)

select count(*) from org_final
1 Like