Is Kimball dimensional modeling still relevant in a modern data warehouse?

randy · July 21, 2020, 12:28am

Here’s my chaos modeling special blueprint:

Raw - Ingest as raw with Fivetran / Stitch / Matillion / Custom Stages, making sure to have an ingestion time and pulling out expected predicate columns if they’re packed in a variant. Strongly prefer append-only ingestion when possible.
Stage - All tables named STG_<SOURCE_TABLE_NAME> with the most basic cleaning - timestamp conversion, upper/lowercasing of strings, null fills, maybe some basic mapping (like mapping AWS Account Id values to human-readable names using a case statement). These are often views.
Analytics - Working from the end of the pipeline, identify use case / pain points and determine the data required for this. Anything a BI tool will hit should be materialized as a table and should have human-readable names with no prefix. Schemas and databases at this level should also be human-readable with schemas split by business function or access level
Curated - Iteratively apply DRY concepts to pull useful models out of individual end-table computations and into a curated layer (prepended with CUR_<TABLE_NAME>). These are a mix of views and tables and should reference only other curated tables and stage tables. Once you do this a few times, some CURATED tables become obvious to you when doing step 3 so you don’t have to go down the path of repeating yourself a ton then reworking backward.

At no point do I normalize incoming sources simply to conform to a formal data model, particularly when I will need to denormalize them by the end of the pipeline.

This approach assumes a smaller data team and a very broad community of mostly BI-focused access. It takes a centralized authority view of the world and doesn’t account for citizen data scientists. This is an approach designed primarily for enterprise clients new to the cloud.

I think formal modeling is very useful, especially in an OLTP environment, but I find that too many large companies that I’ve worked with have gotten obsessed with formal modeling and have completely neglected end-user needs. It has largely been a theatrical exercise that gives the impression of adding value, but at the end of the day we’d spend all our time accommodating some arbitrary modeling needs and end up with no real usage of the warehouse.

I definitely don’t have anything against modeling, I would just prefer to focus efforts on understanding users and delivering value early and intentionally leveraging technical debt in the warehouse in a greenfield cloud data warehouse migration. If we feel the pain of the technical debt, it’ll be due to increases in usage and requirement (ostensibly due to users gaining value) so we can address modeling needs after we’ve delivered value (and using dbt makes it easy to fix this debt). If we don’t feel the pain, then we avoided a costly modeling engagement and can spend all our cycles iterating on use cases to find ways to deliver real value.

I feel like I’m probably in the minority and I’m definitely over biased towards the kinds of projects / companies I’ve worked with.

Topic		Replies	Views
Data Modeling approaches in modern data-times In-Depth Discussions	10	16115	February 9, 2023
Building a Kimball dimensional model with dbt In-Depth Discussions devblog	9	4149	October 16, 2023
How we (used to) structure our dbt projects Archive	22	199103	September 7, 2022
Modern Data Modeling: Are Junk dimensions still relevant? Archive	1	5349	June 30, 2020
Structure Snowflake database, schema In-Depth Discussions	19	15183	June 1, 2021

Is Kimball dimensional modeling still relevant in a modern data warehouse?

Related topics