21

I'm currently in the (re)design phase of several model classes of a C# .NET application. (Model as in M of MVC). The model classes already have plenty of well-designed data, behaviors, and interrelationships. I am rewriting the model from Python to C#.

In the old Python model, I think I see a wart. Each model knows how to serialize itself, and the serialization logic has nothing to do with the rest of the behavior of any of the classes. For example, imagine:

  • Image class with a .toJPG(String filePath) .fromJPG(String filePath) method
  • ImageMetaData class with a .toString() and .fromString(String serialized) method.

You can imagine how these serialization methods are not cohesive with the rest of the class, yet only the class can be guaranteed to know sufficient data to serialize itself.

Is it common practice for a class to know how to serialize and deserialize itself? Or am I missing a common pattern?

kdbanman
  • 1,447

2 Answers2

24

I generally avoid having the class know how to serialize itself, for a couple of reasons. First, if you want to (de)serialize to/from a different format, you now need to pollute the model with that extra logic. If the model is accessed via an interface, then you also pollute the contract.

public class Image
{
    public void toJPG(String filePath) { ... }

    public Image fromJPG(String filePath) { ... }
}

But what if you want to serialize it to/from a PNG, and GIF? Now the class becomes

public class Image
{
    public void toJPG(String filePath) { ... }

    public Image fromJPG(String filePath) { ... }

    public void toPNG(String filePath) { ... }

    public Image fromPNG(String filePath) { ... }

    public void toGIF(String filePath) { ... }

    public Image fromGIF(String filePath) { ... }
}

Instead, I typically like to use a pattern similar to the following:

public interface ImageSerializer
{
    void serialize(Image src, Stream outputStream);

    Image deserialize(Stream inputStream);
}

public class JPGImageSerializer : ImageSerializer
{
    public void serialize(Image src, Stream outputStream) { ... }

    public Image deserialize(Stream inputStream) { ... }
}

public class PNGImageSerializer : ImageSerializer
{
    public void serialize(Image src, Stream outputStream) { ... }

    public Image deserialize(Stream inputStream) { ... }
}

public class GIFImageSerializer : ImageSerializer
{
    public void serialize(Image src, Stream outputStream) { ... }

    public Image deserialize(Stream inputStream) { ... }
}

Now, at this point, one of the caveats with this design is that the serializers need to know the identity of the object it's serializing. Some would say that this is bad design, as the implementation leaks outside of the class. The risk/reward of this is really up to you, but you could slightly tweak the classes to do something like

public class Image
{
    public void serializeTo(ImageSerializer serializer, Stream outputStream)
    {
        serializer.serialize(this.pixelData, outputStream);
    }

    public void deserializeFrom(ImageSerializer serializer, Stream inputStream)
    {
        this.pixelData = serializer.deserialize(inputStream);
    }
}

This is more of a general example, as images usually have metadata that goes along with it; things like compression level, colorspace, etc. which may complicate the process.

Zymus
  • 2,533
3

Serialization is a two part problem:

  1. Knowledge about how to instantiate a class aka structure.
  2. Knowledge about how to persist/transfer the information that is needed to instantiate a class aka mechanics.

As far as possible, structure should be kept separate from the mechanics. This increases the modularity of your system. If you bury the information on #2 within your class then you break modularity because now your class must be modified to keep pace with new ways of serialization (if they come along).

In the context of image serialization you would keep the information on serialization separate from the class itself and keep it rather in the algorithms that can determine the format of serialization --therefore, different classes for JPEG, PNG, BMP etc. If tomorrow a new serialization algorithm comes along you simply code that algorithm and your class contract remains unchanged.

In the context of IPC, you can keep your class separate and then selectively declare the information that is needed for serialization (by annotations/attributes). Then your serialization algorithm can decide whether to use JSON, Google Protocol Buffers, or XML for serialization. It can even decide whether to use the Jackson parser or your custom parser --there are many options you'd get easily when you design in a modular fashion!

Apoorv
  • 1,128