Hi Guys,
I have a question about what’s the best practice in modern data modeling.
Background info…
The old school Kimball style way is to keep fact tables long and thin, meaning they should only contain measures and keys to dimensions.
All further information, particularly low-cardinality data, such status flags should be in dimensions.
The one exception is high-cardinality data, such as order ids, that are unique for a majority of the fact table rows. Adding those to a dimension table would make the dimension table a similar size as the fact table and defeat the purpose. So they are kept as degenerate dimensions
inside the fact table.
In order to avoid creating dozens of dimensions, all data that doesn’t fit into separate dimensions is put into a junk dimension
.
The benefit of these best practices is to keep data storage low and the number of dimensions manageable.
The question…
With that in mind… my question is whether this best practice of using a junk dimension still makes sense or whether it’s better to just keep the data as degenerate dimensions?
If the junk dimension columns aren’t used to drill across to other fact tables, is there a benefit of moving them into a separate dimension?
Modern DBs that use column storage compress low-cardinality data in the fact tables. So storage should not be an issue.
Are there any other disadvantages I am missing if I keep some dimensional data in my fact tables?
And only create dimensions for data that is used across multiple fact tables?