Surrogate key for multiple-source dimension table

pipat_flowacc · May 24, 2022, 8:33am

I’m modeling a customer dimension table which is made from various sources
e.g. sql server, mysql and hubspot. I read Claire’s comment on using surrogate key
in dbt, but I couldn’t come up with a good surrogate key inputs for the table.

A customer data might be created at different time in different sources.
For example, at start, we have staging customer data like this.

user | mssql | mysql | hubspot 
-----|-------|-------|--------
A    | O     | X     | X
B    | X     | O     | X
C    | O     | O     | X
D    | O     | X     | O
E    | X     | O     | O

Then after time passes, other sources populate customer data also.

user | mssql | mysql | hubspot 
-----|-------|-------|--------
A    | O     | O     | O
B    | O     | O     | X
C    | O     | O     | X
D    | O     | O     | O
E    | O     | O     | O

The grain for the dimension table is 1 row 1 customer.
So, the surrogate key should be made from dbt_utils.surrogate_key(['mssql_id', 'mysql_id', 'hubspot_id', ]).
But, this means the customer A’s key will change from “O-X-X” to “O-O-O”.
The key change would affect all fact tables downstream.

How should I implement a surrogate key for this table?
So that a row can be pinned and updated when new sources arrive.

hags · May 24, 2022, 6:01pm

If I understand your requirement, you just need to create the surrogate key from the user column, bt_utils.surrogate_key([‘user’ ]), which is the unique key of this table. This assumes that the “user” value is consistent across your three sources.

pipat_flowacc · May 25, 2022, 2:05am

Yeah. I think it’s the only way to do it.
I need to have a glue id at each of these sources, so that when they are combined,
they land on the same row.
Then, I can use that glue id as a surrogate key macro’s input.
Thx hags.

Topic		Replies	Views
Surrogate keys in dbt: Integers or hashes? In-Depth Discussions devblog	8	6098	February 1, 2024
Can I create an auto-incrementing ID in dbt? Help dbt-utils	4	43411	January 13, 2020
How to add surrogate key from dimension table to fact table? Archive	1	4680	August 7, 2022
How to get incremental number by using surrogate key ? Help	3	135	April 17, 2025
dbt_utils 0.8.6 surrogate_key Help surrogate-keys	1	1588	April 5, 2023

Surrogate key for multiple-source dimension table

Related topics