I have two transactional systems that is used for the same operational business activity: purchase order transactions. One system is legacy and the other is a new modern one. They have different underlying source system data models. The finance team requires historical data (from the legacy system) and current data (from new system) to support their business analytics. I am about to build a new data warehouse. Given that both systems have different underlying data models is the first place I am stuck (i.e., stuck at the Architectural design phase). One of the business goal is to consume the data from both systems to obtain a ratio measure/metric. Data from both systems cannot be mapped directly given they have different models with different granularities, and hence it is difficult to have conformance. The business ideally wants to have both system datasets integrated.
I am considering two design approaches for two different source systems which measure the same business process:
Create two separate dimensional models for the two systems
Create a single dimensional model but represent both systems in separate fact tables
I researched and read Kimball resources. A summary:
Each business process subject area is designed using a nine-step design methodology:
Choose the process.
Choose the grain.
Identify and conform the dimensions.
Choose the facts.
Store precalculations in the fact table.
Round out the dimension tables.
Choose the duration of the database.
Determine the need to track slowly changing dimensions.
Decide the physical design.
(From The Kimball group reader: relentlessly practical tools for data warehousing and business intelligence by Kimball, Ralph, and Margy Ross, John Wiley & Sons, 2010.)
The material I have come across only addresses the common situations faced by designers.
What are design considerations for this case of two systems measuring the same business process? Do I have to do data mapping to map the data elements from legacy system to the new system?
If both systems are representing the same operational business activity, then I think it should be possible to transform and reduce both systems’ data to make them compatible, unless they are two distinct processes or represent substantially different things.
In the case where they want the same metric from both datasets, they will need to reflect the same logic, which is why I think that while the second approach you mentioned is better, potentially the data should be able to be unioned so that the values of the metric ratios are coherent.
The most critical aspect is granularity, so you’ll likely need to reduce it in some tables of one of the systems to make them compatible. And perhaps from there, the perspectives for joining the data become clearer and simpler.
In the cases where I needed to do this legacy-current merge, the most difficult part was the mapping that you mentioned last, which requires a significant amount of time, researching, and asking committed data stakeholders.
After both sources are transformed to the same shape Z, we combined them:
A (shape Z)
UNION ALL
B (shape Z)
We have used look up tables and also bridge tools. The key concept beneath them: Lookup tables contain key value pairs translating codes to descriptions with one to one relationships. So they primarily decode data. Bridge tables is a type of table that connects two or more tables that have a many-to-many relationship.
They are not available, optional tools, they are fundamental data modeling requirements for properly joining and reconciling data structures.