This is a companion discussion topic for the original entry at Building a Kimball dimensional model with dbt | dbt Developer Blog
Hi folks!
A great article, thank you for compiling it all together!
Was confused by one moment : seems you suggest to create a visual ER diagram as only part of documentation, Part 6.
Typically I start with visual ER diagram , and even use it to generate tables DDLs and even dbt code(If I work with dbt).
Any reason you put it only at the end? Will not data engineers benefit from having it in from of them even before they started writing any dbt code, dbt tests etc ? Thoughts ?
Hey there,
Yes, it’s better practice to start with an ER Diagram first as a visual communication tool to stakeholders on what you’re going to create and quickly get their confirmation.
Usually, once the development process starts, the schema will change anyway.
At the end of the development process, it’s good to revisit the initial diagram and make any tweaks/updates to it to reflect the final ERD.
Cheers
Thank you, @jonathanneo, for your invaluable blog! It has been incredibly helpful during our transition from a traditional handcrafted DWH to dbt.
I have a question concerning the handling of missing records in dimensional tables. How would you suggest dealing with this situation?
Here are some related resources I found on the topic:
Design Tip #43: Dealing With Nulls In The Dimensional Model
How do you deal with missing dimensions for foreign keys? - In-Depth Discussions - dbt Community Forum (getdbt.com))
Hey @Koen, thanks for your question. The way I would handle records in the fact table that don’t match to a record in the dimension table is through the method that @josh suggested in his post here.
So, let’s say our dimension table is dim_user
, and we have the following columns:
customer_id
customer_key
(the surrogate key, created by usinghash(customer_id)
customer_name
In the dbt model used to generate dim_user
, I would add a row (using union all
) for the following record:
select
-1 as customer_id,
hash(-1) as customer_key,
'Unknown Customer' as customer_name
Then from the fact table (e.g. fact_sales
), I will try to generate a surrogate key that matches dim_user
.
select
sale_id,
price,
quantity,
hash(coalesce(customer_id, -1)) as customer_key -- if customer_id is null, -1 will be used instead.
from
{{ ref('staging_sales') }}
This way, the fact table will reference a customer with the customer name Unknown Customer
instead of null
which might cause confusion.
Hi @jonathanneo , thanks for the article. Finally someone talking about dimensional modelling in a dbt blog!
My question is the following though: you create a surrogate key for all models, even when they have a single natural key e.g.
{{ dbt_utils.generate_surrogate_key(['stg_product.productid']) }} as product_key,
Isn’t this unnecessary since you already have the productid
here? Or is it for consistency purposes? I have had this discussion with other AE’s where they feel that creating a surrogate when there is already a natural key is unnecessary.
Curious to hear your reasoning.
Kind regards,
Lasse