data warehouse modeling for two different source systems for the same business process

user_1210 · March 31, 2025, 11:18pm

I have two transactional systems that is used for the same operational business activity: purchase order transactions. One system is legacy and the other is a new modern one. They have different underlying source system data models. The finance team requires historical data (from the legacy system) and current data (from new system) to support their business analytics. I am about to build a new data warehouse. Given that both systems have different underlying data models is the first place I am stuck (i.e., stuck at the Architectural design phase). One of the business goal is to consume the data from both systems to obtain a ratio measure/metric. Data from both systems cannot be mapped directly given they have different models with different granularities, and hence it is difficult to have conformance. The business ideally wants to have both system datasets integrated.

I am considering two design approaches for two different source systems which measure the same business process:

Create two separate dimensional models for the two systems
Create a single dimensional model but represent both systems in separate fact tables

I researched and read Kimball resources. A summary:

Each business process subject area is designed using a nine-step design methodology:

Choose the process.

Choose the grain.

Identify and conform the dimensions.

Choose the facts.

Store precalculations in the fact table.

Round out the dimension tables.

Choose the duration of the database.

Determine the need to track slowly changing dimensions.

Decide the physical design.

(From The Kimball group reader: relentlessly practical tools for data warehousing and business intelligence by Kimball, Ralph, and Margy Ross, John Wiley & Sons, 2010.)

The material I have come across only addresses the common situations faced by designers.

What are design considerations for this case of two systems measuring the same business process? Do I have to do data mapping to map the data elements from legacy system to the new system?

marcelo · April 1, 2025, 11:39am

If both systems are representing the same operational business activity, then I think it should be possible to transform and reduce both systems’ data to make them compatible, unless they are two distinct processes or represent substantially different things.

In the case where they want the same metric from both datasets, they will need to reflect the same logic, which is why I think that while the second approach you mentioned is better, potentially the data should be able to be unioned so that the values of the metric ratios are coherent.

The most critical aspect is granularity, so you’ll likely need to reduce it in some tables of one of the systems to make them compatible. And perhaps from there, the perspectives for joining the data become clearer and simpler.

In the cases where I needed to do this legacy-current merge, the most difficult part was the mapping that you mentioned last, which requires a significant amount of time, researching, and asking committed data stakeholders.

user_1210 · April 1, 2025, 10:46pm

Thanks Marcelo. A few questions:

What did your implementation approach to merge legacy & current look like?
Did you apply look up tables to map attributes from a legacy column to a current column for example?
I’ve also heard about the concept of bridge tables to drill-across fact tables - Are you familiar with this as a solution in such scenarios?
Did you do the merge and transformation as part of your ETL pipeline one level before the dimensional modeling phase?

marcelo · April 2, 2025, 12:02am

Our approach:

ELT → EL: Fivetran, T: dbt.

We have different data sources:

Source A with initial shape X
Source B with initial shape Y

We transform each source separately:

Source A (shape X) → Transform → A (shape Z)
Source B (shape Y) → Transform → B (shape Z)

After both sources are transformed to the same shape Z, we combined them:

A (shape Z)
UNION ALL
B (shape Z)

We have used look up tables and also bridge tools. The key concept beneath them: Lookup tables contain key value pairs translating codes to descriptions with one to one relationships. So they primarily decode data. Bridge tables is a type of table that connects two or more tables that have a many-to-many relationship.
They are not available, optional tools, they are fundamental data modeling requirements for properly joining and reconciling data structures.

system · April 9, 2025, 12:02am

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to feed DWH models from multiple async sources Archive	2	2836	June 29, 2020
Is Kimball dimensional modeling still relevant in a modern data warehouse? In-Depth Discussions kimball	17	48371	June 23, 2022
Model and column naming best practices Archive	3	7333	October 9, 2019
Choosing a data warehouse In-Depth Discussions	5	13044	July 1, 2018
Building dbt models to be compatible with multiple data warehouses In-Depth Discussions	2	10105	October 15, 2022

data warehouse modeling for two different source systems for the same business process

Related topics