40

Today, I updated ZBateson\MailMimeParser the PHP e-mail parser library from 1.x to 2.x.

Soon enough, my PHP error log started filling up with errors.

Noting where it happened, I found out that it had to do with their ::parse(...) function: https://mail-mime-parser.org/upgrade-2.0

An additional parameter needs to be passed to Message::from() and MailMimeParser::parse() specifying whether the passed resource should be ‘attached’ and closed when the returned IMessage object is destroyed, or kept open and closed manually after the message is parsed and the returned IMessage destroyed.

That is, instead of picking one of those new "modes" by default, the author(s) simply chose to break all existing code.

Frankly, even after re-reading that page multiple times, I have no clue what the new parameter actually does. I have set it to true just to make the errors stop happening, but I'm worried that this is somehow not the right choice.

My point, and question, is: Why do library developers knowingly break existing code like this? Why not at least have it default to either true or false, whichever is the most reasonable?

Before you tell me that I should have read the upgrade instructions before updating, I sometimes do, but when your life consists of nothing but dealing with constant updates of all kinds of software, you eventually get numb to all the changes and stop spending the time and effort to do so. Is it really reasonable that updating a library (in particular) should break existing code?

And this is not some sort of edge-case of the library, either. It's literally the #1 reason for it to exist in the first place, sure to be used by every single user: parsing an e-mail blob!

user16508174
  • 475
  • 1
  • 4
  • 4

7 Answers7

187

A major version upgrade literally means they intend to break things. You shouldn't upgrade to a new major version unless you're prepared to deal with it. Most build systems have a way to specify you're okay with automatic upgrades to minor versions, but not to major versions.

APIs break for a number of reasons. In this case, I'm guessing it's because what they would want to set the default to would be surprising to some users, either because it's not a typical convention for the language, or because of history with this library. This way, instead of half the users suddenly getting a difficult to explain "file is closed" error whose reason is difficult to find in the release notes, everyone gets a "missing parameter" error that they can easily look up the purpose of.

Remember, not everyone uses the library the same way as you. When you have a diverse user base, you have to make compromises in the API to accommodate everyone. A change that seems unnecessary to you might be just what another user has been waiting for.

Karl Bielefeldt
  • 148,830
55

My point, and question, is: Why do library developers knowingly break existing code like this? Why not at least have it default to either true or false, whichever is the most reasonable?

Because sometimes it's better to force someone to explicitly make a newly added choice, as opposed to making it for them and effectively having to guess.

If my usual restaurant tomorrow starts making two versions of the one dish that they had, I want to choose which dish I have from now on. I don't want them to choose for me and then run the risk of me getting a dish that I do not like and did not knowingly order.

And this is not some sort of edge-case of the library, either. It's literally the #1 reason for it to exist in the first place,

This argues in favor of forcing consumers to explicitly make a choice, exactly because this behavior is so essential to the library's purpose.

The fact that two approaches are now implemented suggests that there is merit to either approach, and one is not superior to the other in every possible way. If it was, then only the superior one would have been implemented.

from 1.x to 2.x

Major version updates tend to break stuff. That's why they're major version updates. They are the biggest scope on how a library can be changed.

If this happened in an update from 1.0.1 to 1.0.2, I would agree with you. Breaking existing code should only be done in a major version update.

Is it really reasonable that updating a library (in particular) should break existing code?

If all road infrastructure today still had to support horse-drawn carriages, the development of road infrastructure would be significantly hindered.

If you're never allowed to break anything, you stand in the way of innovation, and this is precisely how things (especially software) die a silent death.

Flater
  • 58,824
12

Some good answers already, however, let me add my two cents from some real-world experience.

More than often, though usually acting in good faith, some API designers are pretty ignorant what kind of casding effort such decisions might cause. They have probably a wrong idea about how much client code has to be fixed by such a change and how much organizational and communication effort can be triggered by a single new mandatory attribute. Or, they have an idea, but actually no economical motivation to take care for their libraries user base.

What in a library vendor's own organization may require 10 minutes to fix, because they know their lib well and have access to the full code which relies on it, could require to hire a new developer in another organization, for example.

Over the years, I have seen many of those non-backwards compatible scenarios, and in at least 50% of them, I am sure if the designers had put a little bit more thought into backwards compatibility, they could have saved us a ton of working hours.

Doc Brown
  • 218,378
7

Consider when you'd break things yourself. Off the top of my head, I can think of a few reasons:

  • You're updating the API to:
    • Conform to a language convention
    • Shift to a more applicable programming style (say, have a function return with partial application where it makes sense)
  • You're refactoring some implementation, and that allows for a more effective API.
  • You've updated the implementation to the point that the API is no longer helpful. These updates can be as trivial as renaming a function, which is something everyone's done at one point or another.

All of these are aimed at improving the code quality, and, when it effects you, API quality. Sometimes devs make bad decisions, and sometimes the API changes decrease the quality. But most of the time, these are incremental changes aimed at slowly and incrementally improving code: both internal code providing the API and external code relying on the API.

So what do I do?

Here are the two things to do:

  • Care about updates to the code you rely on. If you read release notes before updating (and potentially breaking), you'll be prepared for the consequences of API changes. This can be a pain, but it's ultimately effort invested in code quality.
  • Automate the above. Lots of build systems (though this is language dependent) have the ability to only update dependencies when they don't break, usually utilizing a machine-readable versioning system like Semantic Version (SemVer). SemVer is really simple: start at 1.0.0 as soon as the API is stable. Increment the third digit (i.e., 1.0.0 -> 1.0.1) on updates that don't change functionality, the second digit on updates that change the API in a backwards compatible way, and the first digit if you break the API. Then, if you rely on code that just had a major update, you'll know that you need to put some time aside to fix it.

That last bullet point is always going to be difficult (and slightly controversial), but, complementary to the first bullet point, can be exceedingly useful.

Essentially: breaking things is usually good: be prepared.

2

Ultimately this comes down to the elusive and controversial goal of backwards compatibility.

As a developer of application A, you would like to spend your time doing three things:

  • adding features
  • fixing bugs
  • if you have done your job well, and there are no features to add or bugs to fix, bask in the glow of having written a long-term stable application that continues to be useful to its users even though you don't have to do a thing to maintain it, such that you can move on to bigger and better things.

And if every last bit of the functionality of application A is code you wrote yourself, you might have a hope of achieving this. But to do that you would probably have to reinvent lots of wheels, and that's no good, either. Another fine software engineering principle is reuse, or standing on the shoulders of others. So there is almost certainly at least one resource R that your application A depends on.

So then the question is, as the maintainer(s) of resource R go about their work, probably striving for the same three goals as you — adding features, fixing bugs, trying not to do any extra work — how much are they supposed to worry about you and application A?

You'd like them to worry about you a lot: you'd like perfect backwards compatibility; you'd like them never to make a change that breaks your application. They probably agree that this might be nice, but they probably also claim that it's not realistic: that sometimes, a bugfix or a code refactoring or an evolving need may force them to make a backwards-incompatible change. Or there may be "legacy" features which are on their way through "deprecated" and on the way to "obsolescent", which the maintainers of resource R are finding it's just way too much work to continue to support (which is of course why they're marking those features as "deprecated" or "obsolescent").

The problem is supposed to be ameliorated by a whole separate, third class of software: dependency managers D which are supposed to ease the job of managing dependencies between applications like A and resources like R. Once you've discovered that application A works with R version 1.x but not R version 2.x, and if you don't have the time or energy to rewrite A just now, you can explicitly record this dependency somewhere, and then your users will get a helpful error message telling them that they're screwed after upgrading the shared library for R on their machine.

At the end of the day it's either a tradeoff or a stalemate. The maintainers of R may try, but they're probably not going to manage (they won't have the time or energy) to achieve as high a level of backwards compatibility as the maintainers of A might like. (For any pair {A, R}, of course.) So, once in a while, the maintainers of A are almost inevitably going to be disappointed, to discover that they can't just bask in the glow of having a long-term stable application, because their application has broken, through no fault of their own.

But you have my absolute sympathy, user16508174. Dependency management can be a real nightmare (which of course is why the term dependency hell was coined), and I regularly seethe with impotent rage against it myself. I wish the maintainers of resources R worked harder to maintain backwards compatibility, or better yet, got things right the first time more often so that they didn't get stuck in these binds in the first place. But my wishing for it doesn't make it so, and I have little choice but to resign myself to the occasional (or even pretty regular) disappointment on this score.

The other thing, as you may have noticed from the tone of the answers and comments on your question, is that there may be a certain amount of religious fervor going on here. Not only are you supposed to accept that your application A is going to be broken from time to time by forces over which you have no control, you are not even supposed to complain about it. This is the way of the world. You are supposed to be glad that the maintainers of resource R have the freedom to fix bugs and add features without worrying overmuch about backwards compatibility. You are supposed to celebrate the extra time you get to spend learning how to use dependency managers D and painstakingly recording every last intricate dependency that application A might have. You should not want to bask in the glow of long-term stability; that's a confession of some kind of laziness. You are not supposed to be troubled that, after users upgrade R on their systems to v2.x in order to satisfy the dependencies of some other application B, your application A will be broken, and that those users are about to be pestering you to spend time upgrading A to use Rv2.x whether you wanted to or not. This is, again, the way of the world.

Finally, lest this answer be discounted as mere whining, let me say what I would like: I would like the maintainers of any resource R to work harder at achieving backwards compatibility. I know this is asking a lot; I know you're working hard already, and that there aren't enough hours in the day. But the reason I want this is simple: you are, presumably, maintaining resource R as a service to those who use and depend on it. You are doing your work in order to save them work. Presumably, there are more of them than there is of you. So a relatively small amount of work by you is, presumably, leveraged into a huge time savings on behalf of all your users. And making their lives easier by not forcing backwards-incompatible changes upon them is one way you can achieve this.

2

One of the few things I remember from my CS undergraduate course - now 50 years ago - is David Wheeler's quote "compatibility means deliberately repeating other people's mistakes". Over time you learn that the original design was wrong. It can be wrong because it creates a security weakness, because it leads to poor performance, because it prevents you adding new features that people need, or because it creates a usability problem and a support hassle. As a library designer, you then have to make the decision whether the costs of breaking compatibility justify the benefits.

One thing I learned when I started doing open source software is that this changes the equation. When the users aren't paying you anything, you don't have the same kind of obligation towards them: you can devote your attention more to future users and less to existing users, and you can avoid the substantial costs of maintaining old interfaces that clutter the code and increase your development and support costs.

Another quote, this one unattributed: "the future is longer than the past". That says that getting it right for future users is more important than reducing the pain of upgrade for existing users.

Michael Kay
  • 3,599
  • 1
  • 17
  • 13
1

As someone who's authored dozens of libraries on Nuget and Github, I can tell you that backwards compatability is an important consideration.

Despite some other answers here, I don't believe most libraries authors deliberately set out to break backwards compatibility. Indeed, it is often possible to add many new features and improve existing ones without breaking any existing code that uses the library. And i think you'll find that most library updates do maintain backwards compatibility.

That said, there are times when a better approach is found. And the decision is made that a breaking change provides more advantages than the disadvantages of breaking that backwards compatibility.

I use C# and it allows you to mark elements as obsolete. This generates a compile warning but still compiles, allowing someone more time to refactoring their code.