What are some ways to implement a many-to-many relationship in a data warehouse?

Question

The dominant topologies of Data Warehouse modelling (Star, Snowflake) are designed with one-to-many relationships in mind. Query readability, performance, and structure degrades severely when faced with a many-to-many relationship in these modelling schemes.

What are some ways to implement a many-to-many relationship between dimensions or between the fact table and a dimension in a data warehouse and what compromises do they inflict with regards to necessary granularity and query performance?

IamIC · Accepted Answer · 2011-01-07T07:26:44.210

In my experience, a recursive hierarchy is the most practical way of tackling this. It offers the following advantages:

Unlimited depth.
Compactness.
Flexibility.
Speed.

By contrast, it takes an extra table for each level of "-to-many" joins. This is hard coded and difficult to maintain against schema updates.

By using filtered indexes, a large table of hierarchical joins can perform at superior speed to dedicated tables. The reason is each join is only "parent-child" compared to "to join table to data table". The latter has more indexes to process and store.

I've been trying to solve this problem for many years. Recently, this is what I came up with.

score 7 · Answer 2 · answered Feb 07 '12 at 14:50

Some scenarios for M:M relationships in a data warehouse model

Most OLAP servers and ROLAP systems have a means to deal with M:M data structures now, but there are some caveats about this that you will need to pay attention to. If you do implement M:M relationships you will need to keep an eye on your reporting layer and what tools you want to support.

Scenario 1: M:M dimension onto a fact table

An example of this might be multiple drivers on a motor policy. If you add or remove a driver, the policy adjustment transaction may have a relationship to a list of drivers that changes with the adjustment.

Option 1 - M:M driver-fact bridge table This will have quite a large volume of data, as it has drivers x transactions rows for a given policy. SSAS can consume this data structure directly, but it is slower to query through a ROLAP tool.

If your M:M relationship is based on entities that are specific to the fact row (e.g. drivers on a car) this might be appropriate for a ROLAP tool as well, providing your ROLAP tool supports M:M relationships (e.g. using contexts in Business Objects).

Option 2 - Dummy 'combinations' dimension table If you are mapping a list of common codes to a fact table (i.e. the linked entities are not peculiar to the fact row) then you can take another approach that will reduce the data volumes. An example of this type of scenario is ICD codes on an inpatient visit. Each inpatient visit will have one or more ICD diagnoses and/or procedures listed against it. The ICD codes are global.

In this case, you can make up a distinct list of the combinations of codes on each case. Make a dimension table with one row for each distinct combination, and have a link table between the combinations and the reference tables for the ICD codes themselves.

The fact table can have a dimension key to the 'combinations' dimension, and the dimension row has a list of references to actual ICD codes. Most ROLAP tools can consume this data structure. If your tool will only work with an actual M:M relationship then you can create a view that emulates the M:M relationship between the fact and the coding reference table. This would be the preferred approach with SSAS.

Advantages of option 1: - Appropriately indexed, queries based on selecting fact table rows with a certain relationship through the M:M table can be reasonably efficient.

Slightly simpler conceptual model

Advantages of option 2: - Data storage is more compact

You can emulate a straight 1:M relationship by presenting the combinations in a human-readable format as a code on the 'combinations' dimension. This might be more useful on dumber reporting tools that lack support for M:M relationships.

Scenario 2: M:M relationship between dimensions:

Harder to think of a use case, but one could envisage something out of healthcare with ICD codes again. On a cost analysis system, the inpatient visit may become a dimension, and will have M:M relationships between the visit (or consultant-episode in NHS-speak) and the codings.

In this case you can set up the M:M relationships, and possibly codify a human-readable rendering of them on the base dimension. The relationships can be done through straight M:M link tables or through a bridging 'combinations' table as before. This data structure can be queried correctly through Business Objects or better quality ROLAP tools.

Off the top of my head, I can't see SSAS being able to consume this without taking the relationship right down to the fact table, so you would need to present a view of the M:M relationship between the coding and the fact table rows to use SSAS with this data.

score 5 · Answer 3 · answered Jan 11 '11 at 21:47

I would like to know exactly what kind of many-to-many relationship you have in mind in your model, either as it is in the transactional system or whatever data model it currently is in.

Typically, many-to-many relationships between dimensions are facts about dimensions. The fact that a customer orders from a several branch offices who service many customers, or something like that. Each of those is a fact. It would have an effective date or something like that, but the relationship could be "fact-less". The relationship itself may have other dimensions besides customer and branch office. So it's a typical star schema with a (potentially fact-less) fact table at the center. How this star can relate to other dimensional stars in the warehouse will obviously depend. Any time you combine different stars, you would do so on the business keys and have to ensure you aren't performing inadvertent cross-joins.

Typically one does not report on such dimension relationship tables to the same degree as larger fact tables and when they do, it's not always as much data, so it doesn't tend to affect performance. In the above case, you might look at customer/branch utilization over time, but better data about actual order quantities would be available in your order fact table, which would presumably also have dimensions for the customer, branch, etc. These are not what most people would consider many-to-many (although an order could be considered to define a many-to-many relationship from customer to branch), so are more typical in data warehouse environments. You would only be doing numerical aggregates on many-to-many models if you had rolled up summary information to that relationship level - i.e. customer, branch, month, total sales - a summary fact table, which looks more like a one fact-many dimension model now that it has numerical data.

score 5 · Answer 4 · edited Oct 19 '16 at 13:56

Here are some relevant articles from Kimball and others which may apply to modeling a given proposed many-to-many relationship. Note that a many-to-many relationship is a concept in the problem domain/logical model only. In a normalized OLTP model it would still be handled with a link table which is, of course, one to many each way. In a non-normalized Kimball data warehouse model there are a number of ways to do this, one of which basically treats that link table as the fact at the center of a star. Another is as an array of flag columns.

Ultimately, the choice is going to depend on what exactly you are modeling, how it's changing and how you want to report on it. This is where the dimensional modeling and data warehousing in general diverges sharply from the normalized model. The normalized model concentrates on the logical and theoretical relationships in the data, which data warehousing always keeps an eye on the realistic use cases and denormalizes to make those perform.

Modeling alternative hierarchies using a bridge table:

http://www.kimballgroup.com/wp-content/uploads/2012/05/DT62Alternative.pdf

Three options for a many to many relationship (tied to numerical share allocations - see the comments for some interesting back and forth)

http://www.pythian.com/news/364/implementing-many-to-many-relationships-in-data-warehousing/

Unfortunately, a lot of Kimball's Information Week/DBMS mag articles no longer have good links...

score 1 · Answer 5 · answered Jan 10 '11 at 09:50

1

One way we can resolve this is to have a Fact table just have 2 columns , of foreign key from the 2 dimensions having a many to many relation ship .

answered Jan 10 '11 at 09:50

paranjai

121
2

What are some ways to implement a many-to-many relationship in a data warehouse?

5 Answers5

Linked