6

I've noticed a certain idea recur in different contexts, but before I start calling it "the sandwich pattern", it would be useful to know (in the spirit of other "Is there a name for this pattern?" questions on this site):

  1. whether it is a recognized concept that already has a better-known name, and

  2. whether there are further examples to illustrate this pattern.

What I'm thinking of as "the sandwich pattern" is abstracted from a couple of examples:

  1. The first is Ned Batchelder's "Pragmatic Unicode" from 2012, where he suggests a "Unicode sandwich": [emphasis added]

    the data coming into and going out of your program must be bytes. But you don’t need to deal with bytes on the inside of your program. The best strategy is to decode incoming bytes as soon as possible, producing unicode. You use unicode throughout your program, and then when outputting data, encode it to bytes as late as possible. This creates a Unicode sandwich: bytes on the outside, Unicode on the inside.

    The name is repeated at the end, as one of the three main take-aways:

    Unicode sandwich: keep all text in your program as Unicode, and convert as close to the edges as possible.

  2. The example that motivates this question. At work, we have a system (program) that (among other things) happens to deal with dates, and a date can differ depending on whether it is in:

    • the customer's timezone, or

    • what I'll call the "global timezone", for simplicity. (Don't worry about the fact that different machines can differ in their timezone; it's not relevant here and if you prefer you can think of it as UTC… that's not the point of this question.)

    The code can get a bit confusing and error-prone for this reason, converting dates between timezones here and there, but it turns out that much of the business logic of this system is concerned with the customer timezone, so I'm thinking about the following proposal:

    • Establish the convention (enforced with types if necessary; that's a separate discussion) that the "customer timezone" is used everywhere in the code, i.e. a date means the date in the customer timezone by default, except:

    • at certain boundaries, where it is necessary to interact with the "global timezone", we immediately convert either from the global timezone to customer timezone (when the date comes from the external system), or from customer timezone to the global timezone (when the date needs to be written out to the external system).

    That is, a "sandwich" with global timezone on the outside, and customer timezone on the inside.

Both these examples appear to me to be instances of a single, more general pattern, something like [this is not a quotation, just my attempt to put the pattern in words]:

Sometimes, there can be multiple kinds of the same "thing" (like "bytes" and "unicode" in (1), or "global date" and "customer date" in (2)) that tend to be used in similar contexts / have some semantic overlap.

To avoid error, confusion, and too much back-and-forth conversion between the two kinds of "thing", it helps to have a convention where you:

  • identify the preferred kind, and
  • design your system such that "things" inside the system are of the preferred kind, and conversion happens as close as possible to the boundaries with whichever external systems require the other kind.

(But maybe the pattern is even more general; see the "Aside" later below.)

This pattern seems a "design pattern" at least in the broader sense of "a general, reusable solution to a commonly occurring problem within a given context" (the Wikipedia lede), even if not in the narrow sense of the original object-oriented design patterns from the "Gang of Four" book (e.g. consider that the book Game Programming Patterns includes "patterns" like "data locality"). But in any case, my question is not about whether this qualifies as a "design pattern", but about the "sandwich" software-engineering idea/principle/pattern itself: specifically, whether there are existing names (and other examples) for it.


Aside: That's the question, but it's possible that the two examples above, and the common pattern I abstracted from them, are not at the right level of generality/abstraction, because perhaps the following examples can also be folded under the same pattern:

  1. Also from 2012 is Gary Bernhardt's "Functional Core, Imperative Shell" idea. It is also mentioned in his "Boundaries" talk (YouTube), and searching this site finds several mentions. Roughly, the idea is to have impure functions only on the "outside" (the "shell") and have "pure" functions in the "core" (more links here).

  2. Not an independent example, but an answer on this site by user Theraot, taking inspiration from "the idea of a pure core and an impure shell", suggests a similar idea for async functions:

    interacting with external systems is often async […] The solutions is that the entry point will be impure (and async) it will deal with external systems (impure imperative shell) and call into your not async code (pure functional core), which returns to the shell for more interoperability. That way you do not have to make everything async, and you do not have to make your whole code impure.

These examples are different in some ways from the "sandwich pattern" as in the actual question above the line: The conversion only goes one way; there's no straightforward way to convert an impure function to pure, or async function to sync. It's not like there's a risk of accidentally using the "incorrect" kind of function (and we're less worried about the reader getting confused which is which), so the problem being solved is entirely different. But what's similar is the idea of consciously choosing to declare one kind as preferable, and carefully designing such that this kind is used everywhere in the "core" except a "shell" where forced otherwise by external boundaries. A supporting coincidence(?) is that a search just before asking this question led me to Mark Seemann's 2020 post "Impureim sandwich", where (apparently independently) he comes up with the term "sandwich" for this pattern.

4 Answers4

6

Canonical Form

Or, as I've heard folks calling it more casually: a canonical representation. This is a term derived from math.

In computer science, canonicalization (sometimes standardization or normalization) is a process for converting data that has more than one possible representation into a "standard", "normal", or canonical form. This can be done to compare different representations for equivalence, to count the number of distinct data structures, to improve the efficiency of various algorithms by eliminating repeated calculations, or to make it possible to impose a meaningful sorting order.

Now, this may not seem too relevant, but only because the language of math is highly abstract. (And that's why I've quoted more CS-related term rather that the original.) But, consider the following not immediately obvious implications:

  • Loss-less. Canonical form is a loss-less representation. It may serve as an intermediate for converting between other representations of the same kind.

    For example, Unicode is a super-set of many other encodings.

  • Arbitrary. Sometimes there are many equally good alternatives for the role of a canonical form. One may be chosen anyways, just for sake of standardization.

    In your example, customer's time zone was agreed upon, not because it's a better format, but because it's more prevalent in the code base (my guess), allowing the developers to spend less brain juice on validating their assumptions.

  • Conventional. The name itself suggests that a canonical form is a some sort of a standard (especially so if you'd call it a standard form), potentially making a convention more agreeable.

  • Contextual. Is SE, we often re-use and override concepts in different contexts. There is no global canonical form for a given kind of data, so you are free to declare your own standards in the scope that you own.

    You'd think something like Unicode should have a predefined canonical form, but an Unicode handling library may represent it differently under the hood, to improve performance or maintainability, and so that will be the canonical form for that library.


As a side note, I don't think that a sandwich is a good model for the given examples, because sandwich (as described) puts emphasis on crossing the boundaries. But we do boundaries all day every day: APIs, data hiding in OOP, namespaces, modules, domains, packages, etc. There's nothing special about this, besides boundaries being boundaries.

Rather, the crux is in the meaty part of the sandwich: establishing a certain data format/structure to be a standard way of working with a given kind of data. Which is similar to what you've described with you own words. Without a standard, the middle layer of that sandwich would be a salad of different formats and conversions between them.

2

You have given a large number of examples, in very unrelated areas of application design. Is there a pattern? I think in the broadest sense of the word "pattern" then, yes, there is a pattern. I don't think this qualifies as a software design pattern, because the problems in each example are different, and yet clearly there is a "pattern."

It is like fractals. Fractal patterns are common in nature, from the shapes of galaxies to the branches of trees, to the shapes of the veins in a leaf. Fractal patterns are everywhere but do not represent a single, general solution to a problem common to things of all shapes and sizes across time and space.

You have identified a very, very general pattern of one thing on the outside and another thing on the inside, but I'm afraid this does not fit the description of a Software Design Pattern.

2

I've applied this pattern in at least two other situations:

  • External 1-based indexes, internal 0-based indexes.
  • External a string containing JSON, internal an app-specific data structure.

I think it's fair to say this is more than a naturally-occurring phenomenon that's interesting to observe. It's something you can intentionally apply to improve the design of your code. In my mind that qualifies it as a design pattern, even though it may not be a well-known one.

For example, in my index case, there were conversions scattered throughout a fairly extensive code base. It caused frequent bugs. The one I was looking at that day was caused by accidentally applying a 2-based index to a 0-based array, because one of the dozens of conversions had been done in the wrong direction. Applying the sandwich pattern essentially eliminated that class of bugs.

My second example I've also seen called Parse, don't validate, but the premise is the same. Use the safest, most consistent representation in as much of your code as possible, and convert it as needed on the boundaries.

Karl Bielefeldt
  • 148,830
1

From what you describe, I go simply with "zone awareness" or "zone thinking". It sounds trivial at first, but IMO it is such a vital thing for every software developer.

I can extend your list with own examples, like Karl Bielefeld did:

  • Data are available in XML, but later they are available as business objects, later again as XML or as HTML.
  • Data are unsafe at first, later they are safe. This may concern simple details, e.g. data are safe to be not null or safe not to have malicious attack values included.
  • Data are raw at first, later they contain highlight markup information coming from a search entry. You do not want to store data with highlight markup information into the model table. You need zones.
  • Activity start and end times are raw and not rounded at first, but later they are rounded e.g. to five minutes and flattened (no parallel times). The rounding and flattening algorithm may be complex.

What I am saying is that I do not search a pattern name for this, but I recognize the importance of zones and try to stick to the following rules:

  1. Define clear zones.

Make sure every piece of code inside a zone expects and delivers compliant data. Once you have zones, e.g. call graphs will be easier understand.

  1. Make sure your data go from one zone into another only by a clear defined (and tested) way.

That may be a big conversion class, but it might also be a simple, tiny convention by which you have variable names with the prefix "safe" for data that are safe. Every usage of not "safe" data in certain places are red flags.

If I remember correct, I encountered the word "zone" the first time with clear awareness in an article about writing secure code. Now in your article we read "Sandwich", which is creative and a good picture. I'll stay with "zones".