This is a companion discussion topic for the original entry at Surrogate keys in dbt: Integers or hashes? | dbt Developer Blog
Great article, I do have a question:
BI tools like (Azure) Analysis Services & Power BI perform best when join columns have a small datatype (like int) and memory usage is reduced as much as possible. Importing a pure star schema with hashed values as PK/FK’s in to these tools wouldn’t be the best solution especially not with wide fact tables with many dimension FK’s.
Since last year databricks / delta lake supports auto increment generated columns.
Is it possible to include this column in the model of a dimension, I did some attempts but weren’t able to succeed. I wonder if there is a way to hook in to the create table statement of the dimension and add a line like:
pk_customer_id BIGINT GENERATED ALWAYS AS IDENTITY
Looking forward to your reply.
Did you ever figure anything out for this? I’m currently researching the same thing.