I’m designing a Kimball-style data warehouse. My fact table has the grain: 1 row per user per transaction. Each transaction ID appears only once in the fact table.
The transaction itself has ~15 descriptive attributes (type, status, channel, method, flags, etc.). If I move these attributes into a transaction_dim table, it results in a 1:1 relationship between the fact and the dimension, which doesn’t seem like a proper dimension (since dimensions normally have a 1-to-many relationship with facts). No other fact tables refer to this dimension.
Question:
If all of these descriptive attributes exist at the same grain as the fact (one value per transaction), is it correct Kimball practice to store all of them directly in the fact table instead of creating a 1:1 “transaction dimension”?
I’m trying to confirm whether keeping all transaction-level descriptors in the fact table is the right modeling choice, or if there is a better dimensional modeling pattern for this scenario. Thank you!
If you’re doing dimensional modeling, attributes that are unique to each transaction row in the fact table (i.e. whose cardinality is the same as your fact’s table), these should indeed be stored directly in the fact table. In fact, Kimball has a name for this: degenerate dimensions. In your example, something like the transaction number fits this perfectly.
However, some of the attributes you listed sound like proper dimensions. Transaction types, methods, … etc are probably repeated between multiple transactions, which is a good indication that the attribute can be treated as a full dimension.
While you only have one fact table referring to these right now, this might not always be the case. It’s common to eventually build a related periodic snapshot or accumulating snapshot fact table, which might want some of the same attributes. In that situation, giving these attributes their own dimension lets you easily build reports that drill across the related fact tables.
If transaction attributes do not fit naturally into one of the existing dimensions, he recommends created a “transaction profile” dimension with all of the miscallaneous flags, etc.
There is an exception, which is if you’re certain that there will only ever be one fact table referring to these attributes, in which case it can be OK to just store them in the fact table. In Kimball’s day (1990s-2000s) this would have been bad for query performance, hence his recommendation to store them in a full dimension, but this isn’t true with modern OLAP query engines, which you’re probably using.