What is a canonical schema in a context of microservices architecture?

Question

A recent question mentioned the term canonical schema in a context of microservices architecture. After reading the Wikipedia article, as well as one of the answers to the question, I still don't understand what canonical schema is about. I get that it's a way to decouple the microservices by doing some magic with the data model, but I'm lost when it comes to the concrete application of the pattern. Other resources are talking about standardized information sets, which make things only more cryptic.

Imagine the microservice A is consuming messages from the microservice B through a message queue service. Let's say those JSON messages contain information about the availability of the products in a warehouse. While A doesn't have to know anything about the existence or location of B, I imagine that it still needs to know:

That the messages are formatted using JSON. If B suddenly starts to format messages in XML, I can hardly see how A would magically adapt itself, unless it was specifically programmed to deal with both JSON and XML messages.
The actual data model, limited to the part used by A. If A simply needs the product ID and the availability, A may not bother to know that the JSON message also contains the product full name, or the location within the warehouse. But it has to know that the product ID is stored in the field /product/id and formatted as a GUID, and that the quantity is stored in /quantity and formatted as a number. Again, if B switches to long-based IDs for the products, A won't be able to deal with it, unless the programmer had this potential format change in mind.

So, what this design pattern is about, and how is it used in practice? Given my example with the services A and B, what would happen if canonical schema pattern is applied?

Maybe it's all about A reading the schema of B and adapting dynamically to it? So it's exactly like Swagger, and also like reading WSDL on runtime and determining how a SOAP service should be called, is it?

score 6 · Accepted Answer · answered Nov 16 '16 at 03:49

None of the above. The article (and the post you link to) specifically say that MicroServices tend to not use a Canonical Schema.

With a Canonical Schema, there's no magic; the whole point of it is that within your SOA ecosystem you have a common model and format for a given 'thing'. It's like a contract. "Anytime we represent a User object, it will have the following schema: ".

Microservices, by contrast, tend to enforce their own data needs, and do whatever transformations they need internally, or when communicating with a different service themselves. Still no dynamic schema munging.

score 0 · Answer 2 · answered Apr 13 '23 at 09:59

Use of Canonical schemas pre-date Microservices, and is a common practice in Enteprise Application Integration and Enterprise Service Buses, when integrating multiple systems, both within your organisation, but also with external partner systems or services.

The goals of adopting canonical schemas include:

Attempting to establish a common / best practice nommenclature and models for your enterprise data entities (this would have spanned integration, and analytic, including data warehousing). If possible, existing or de-facto industry standard formats should be preferred over re-inventing new proprietary schemas in your enterprise. (e.g. different systems in an retail enterprise may refer to a Product by as a sku, an item_id, Product_Id etc. Choose one preferred name, and model for Product, and use that throughout the enterprise)
To prevent system-specific naming (internal, and external partner system integration), typing and modelling opinions from 'bleeding' into your enterprise.
To prevent point to point system mapping complexity. As the number of systems increase in an enterprise, the complexity of mapping increases geometrically if each new system needs to integrate with different schemas to existing systems. With a canonical schema, each system only need map its internal representation to / from the canonical format for each of the message types that it uses, irrespective of the number of other systems in the enterprise. This is logically closely related to hub and spoke architecture.

Perhaps one of the reasons that 'explicit' canonical schemas are not as common in a modern enterprise with multiple bespoke systems (including microservice systems), where centralized design leadership and architectural roles exist, is that there's a much better chance that the system interfaces (including messages and APIs) will be 'canonical' to your enterprise from the outset, and as a result message and API integration between your microservices will share a common naming, typing, and entity definition (although an anti corruption layer in each system is still a good idea for isolation and future proofing).

What is a canonical schema in a context of microservices architecture?

2 Answers2