Does current evidence support the adoption of Contextual over Canonical Data Models?

Question

The "canonical" idea is pervasive in software; patterns like Canonical Model, Canonical Schema, Canonical Data Model and so on, seem to come up again and again in development.

Like many developers, I've often followed, uncritically, the conventional wisdom that you need a canonical model, otherwise you'll face a combinatorial explosion of mappers and translators. Or at least, I used to do that until a couple of years ago when I first read the somewhat-infamous EF Vote of No Confidence:

The hypotheses that once supported the pursuit of canonical data models didn’t and couldn’t include factors that would be discovered once the idea was put into practice. We have found, through years of trial and error, that using separate models for each individual context in which a canonical data model might be used is the least complex approach, is the least costly approach, and the one that leads to greater maintainability and extensibility of the applications and endpoints using contextual models, and it’s an approach that doesn’t encourage the software entropy that canonical models do.

The essay presents no evidence of any kind to support its claims, but did make me question the CDM approach long enough to try the alternative, and the resulting software didn't explode, literally or figuratively. But that doesn't mean a whole lot in isolation; I could have just been lucky.

So I'm wondering, has any serious research been done into the practical, long-term effects of having a canonical model vs. contextual models in a software system or architecture?

Or, if it's too early to be asking that, then have any developers/architects written about personal experiences switching from a CDM to independent contextual models, or vice versa, and what the practical effects were on things like productivity, complexity, or reliability?

What about the differences at different levels, i.e. using the same model across a single application vs. using it across a system of applications or an entire enterprise?

(Facts only, please; war stories are welcome but no speculation.)

score 6 · Answer 1 · edited Jun 14 '22 at 08:14

In response to the EF Vote of No Confidence article, Tim Mallalieu writes:

We are not recommending that folks return to the days where we were evangelizing the use of XSD for “canonical schemas”. I don’t believe that people think that this is tractable. What we do believe, however, is that it is desirable to have a single meta-model (EDM if you will) with which you can describe many domain models and that by having a single grammar we can provide a set of common services on any given domain model.

For example, consider an application that is to be written against a database with 600 tables. Do I believe that this app should have a single model with 600 Entity Types in it? No… Furthermore, do I believe that any given domain entity (say Customer) has only one shape in that app and that this shape must be the canonical shape for the entire Enterprise?… Heck no.

The Wikipedia article for Canonical Model references things like Enterprise Service Bus, Service-Oriented Architecture and CORBA, things which seem like they're hardly talked about anymore. They were all posed as the solution to the data proliferation and communication challenges of the enterprise, the One Ring to Rule Them All.^TM Did they succeed? Or did they collapse under their own weight?

You asked for personal experiences, so I'll give you one. In the aerospace industry we use telemetry a lot. One of the challenges with telemetry systems is finding a way for different test ranges to communicate test data with each other in a meaningful way. That problem seems simple enough, until you attempt to define a data dictionary of common terms.

What does "altitude" mean? Is it the height above the ground, or is it the height above sea level? What if you're talking about a submarine? Then its depth, not altitude. To the Army, the word "transmission" has a different meaning when you are referring to a radar dish than it does to a ground-based vehicle. The wing surface that causes an aircraft to roll is called an "aileron" on some planes, and an "elevon" on others.

That's only a hint at the mountain of problems that follow. Although there are standards for data communications, every test range is different, and has different needs, goals and priorities. Standards can differ even among different projects on the same range. For this reason, test ranges understand that the solution will not come by replacing everything with a single, monolithic system, but by agreeing on simple communication protocols and providing ways to translate from one range's vocabulary to another.

The problems that large companies face are similar. Microsoft tends to think in monolithic terms, but that's because their company is, by and large, monolithic. As soon as you need to communicate between different companies with vastly different cultures and ways of doing business (or even between disparate departments in the same company), the One Ring to Rule Them All.^TM immediately begins to break down.

Does current evidence support the adoption of Contextual over Canonical Data Models?

1 Answers1

Linked