2

I am member of the Apache PLC4X (incubating) project. Here we are currently implementing multiple industry PLC protocols. While we initially focussed on creating Java versions of these, we are currently starting to work on also providing C++ and other languages.

Instead of manually syncing and maintaining these, we would rather define the message structures of these protocols in a generic way and have the model, parsers and serializers generated from these definitions.

I have looked at several options: 1) Protobuf 2) Thrift 3) DFDL

The problems with these are the following:

1) Protobuf seems to be ideal do design a model and have model, serializers and parsers generated from that. With Protobuf it is easy to define a model and ensure I can serialize an object and deserialize it with any language. However I don't have full control over the transport format. For example if I was to encode the constant byte value of 0xFF, this would be a problem.

2) Thrift seems to be more focussed on the services and the models used by these services. The same limitations seem to apply as for Protobuf: I have no full control over the transport format

3) DFDL seems to be exactly what I'm looking for as I want a language to describe my data-format ... unfortunately I could find projects like Daffodil, which seem to be able to use DFDL definitions to parse any data format into some XML like Dom structure. For performance and memory reasons we would rather not do that. Other than that I couldn't find any usable tooling.

Also had a look at Avro and Kaitai Struct but Avro seems to have the same issues for my usecase as Protobuf and the guys from Kaitai told me serialization was still experimental

My ideal workflow would be (Using Maven):

1) For every protocol I define the DFDL documents describing the different types of messages for a given protocol

2) I define multiple protocol implementation modules (one for each language)

3) I use a maven plugin in each of these to generate the code for that particular language from those central DFDL definitions

2 Answers2

0

According to IBM

Data Format Description Language (DFDL) 1.0 is a modeling language from the Open Grid Forum that is used to define the structure of general text and binary formatted data in a way that is independent of the data format.

The particular problem that I see is the fact, that at least with Thrift (and proto) the data format is partially in the code generated from the IDL. For efficiency purposes, Thrift serializes all fields using numeric field IDs, not via their names. The field ID can (should) of course be specified in the IDL, but the number alone wont tell you much about the fields contents and intention.

If that, however, can be modeled via DFDL (can it?) then why don't you write a generator to generate Thrift IDL from your DFDL documents?

Same strategy could be applied to proto files, Avro, XML schema ... you name it. That way you have one source and everything else is generated.

JensG
  • 2,473
0

I can not at the moment investigate if this will fully meet your criteria but take a look at Cap'n Proto which at first look is more up your alley then Protobuf. Although you have much more control over the encoded data it doesn't seem to be full control still.

jaskij
  • 575