7

I'm reaching out with the following situation:

  1. I'm the author of an application that saves a file format to disk.
  2. There are real users using the application in their workflows.
  3. Frequently, I want to change the file format. But I don't want to break users workflows!

The above is a bit vague because I don't want to appear like I'm trying to self-promote - I swear I'm not, but the product's here if you wanna check it out.

I'm looking to:

  1. Minimize the amount of old code I have to maintain.
  2. Not screw over my users if I can avoid it! I don't want them to have to recreate their files!

Is there any architectural decisions I can make - or processes I can adopt - to help me version this file format? I'm a total noob here, so any information is greatly appreciated.

Doc Brown
  • 218,378
Nate Rush
  • 181

4 Answers4

6

A good start would be to resist the urge to keep changing the file format.

If you really must, think about backward compatibility. Adding new features to an existing file format doesn't stop you reading old files that don't use that feature. But completely redesigning the format means that your reader has to support every version separately.

It may help to include a file format version number at the start of the file, so the application can check it.

Simon B
  • 9,772
4

First things first. Version your file format.
Write a version number or other identifier as the first thing in the file, then work out which class you need to be able to read that format.

One of the biggest mistakes that "Our Friends in Redmond" made with the very first version of the .Net Framework was failing to include any method by which an application could ask which version of the Framework it was running against! (This was, not surprisingly, added in .Net 1.1!)

Always allow your code to read older file versions.

You'll probably have to allow every version you'll ever produce, so limit the number of file versions you come up with, if only to save your Sanity. You may also want to keep track of the [continued] use of older versions.

Only allow your code to write the latest version.

That way, users' files can be automatically upgraded as they work with them.

(And it's way better than MS Office 2003 SP3 that, for a week or so, effectively "deleted" every Word 2.x and earlier document on the planet, by refusing to even open them!)

Phill W.
  • 13,093
3

Well first, this is a good use for regression tests. Whenever you make a change to the file format, have saved and documented examples of files in the old format. Then make unit tests that try and load in the old format and assert that they load in correctly. This can be as simple as

def assert_can_read_v1():
    with open("legacy/v1/example_1.wtvr") as f:
        data_structure = read_logic(f)
    assert data_structure.title == "abc"
    assert data_structure.stuff == "..."

But that's from a process perspective, how to make sure you don't mess up. How do you actually write the code?

Well, its gonna depend on a lot of factors including on how different the files are and what kind of format things are stored in.

If things are stored in a text based, or otherwise generic, format like JSON or XML or in a data format that is built to allow extra fields in data like ProtocolBuffers or CapnProto then adding fields should be fairly simple. You just take all the places where you read the field and add some default value if the field is not there.

If you are using a format that doesn't allow for that kind of extension or where removing fields is backwards incompatible or where you are making larger incompatible changes, you can write some predicate that tells you what "version" of file you have and dispatch to the right functionality. You can make this easier on yourself by adding explicit "version" fields to things, but that is on you.

def read_file(file):
    contents = parse_format(file)
    if contents.version == 1:
        return parse_contents_v1(contents)
    else:
        return parse_contents_v2(contents)

As a corollary, this is easier to do if you have some model in your code that is "separate" from what your config file reads into. That way you have some place to put this massaging logic.

So you could start with something like this.

import dataclasses
import json

@dataclasses.dataclass(frozen=True) class Project: name: str

@staticmethod
def read_from_file(file_path):
    with open(file_path, "r") as f:
        contents = json.load(f)
    return Project(name=contents["name"])

And do some minor upgrades to get this

import dataclasses
import json

from typing import Optional

@dataclasses.dataclass(frozen=True) class Project: name: str description: Optional[str]

@staticmethod
def read_from_file(file_path):
    with open(file_path, "r") as f:
        contents = json.load(f)
    return Project(
        name=contents["name"],
        description=contents.get("description", None)
    )

And then maybe need to do a major overhaul and end up with

import dataclasses
import json

from typing import Optional

@dataclasses.dataclass(frozen=True) class Project: name: str description: Optional[str]

@staticmethod
def read_from_file(file_path):
    with open(file_path, "r") as f:
        contents = json.load(f)
    version = contents.get("version", None)
    if version is None:
        return Project(
            name=contents["name"],
            description=contents.get("description", None)
        )
    else:
        name, description = contents["stuff"].split(":::", 1)
        return Project(name=name, description=description)

Does that sorta make sense? There are no hard set rules, but having the very first thing - regression tests - can help a lot.

2

Apart from the excellent advice "write a version number into your file":

You will have older and newer versions of your application, trying to open older and newer versions of documents. For best compatibility:

  1. Design your file format so that items can be identified without understanding them. So an old version of your application confronted with a new employee database can see "there's a thing named 'company car' which I don't understand, I'll ignore that" instead of just falling over.

  2. Into a document, write the version that created the document, and the oldest program version that can read the document by ignoring things that it doesn't understand. So version 2.8 of your app may say "this document was created by 3.1, but it says it can be read by 2.7 by ignoring unknown things, so I can read it and display it but don't allow modification".

  3. Whenever you update the document format, say from version 1.7 to 1.9, you write a function that can change 1.7 documents to 1.9. If your app sees an older document, it can update it step by step until it is able to read and/or write it.

gnasher729
  • 49,096