What is the common way to handle visibility in libraries?

Question

This question about when to use private and when to use protected in classes got me to think. (I'll extend this question also to final classes and methods, since it is related. I'm programming in Java, but I think this is relevant to every OOP language)

The accepted answer sais:

A good rule of thumb is: make everything as private as possible.

And another one:

Make all classes final unless you need to subclass them right away.

Make all methods final unless you need to subclass and override them right away.

Make all method parameters final unless you need to change them within the body of the method, which is kinda awkward most of the times anyways.

This is pretty straightforward and clear, but what if I'm mostly writing libraries (Open Source on GitHub) instead of applications?

I could name a lot of libraries and situations, where

A library got extended in a way the developers would never have thought of
This had to be done with "class loader magic" and other hacks because of visibility constraints
Libraries got used in a way they were not built for and the needed functionality way "hacked" in
Libraries couldn't be used because of a small issue (bug, missing functionality, "wrong" behavior) that could not be changed due to reduced visibility
An issue that could not be fixed led to huge, ugly and buggy workarounds where overriding a simple function (that was private or final) could have helped

And I actually started naming these until the question got too long and I decided to remove them.

I like the idea of not having more code than needed, more visibility than needed, more abstraction than needed. And this might work when writing an application for the end user, where the code is only used by those who write it. But how does this hold up if the code is meant to be used by other developers, where it is improbable that the original developer thought of every possible use case in advance and changes/refactors are difficult/impossible to make?

Since big open source libraries are not a new thing, what is the most common way of handling visibility in such projects with object-oriented languages?

score 15 · Accepted Answer · answered Aug 23 '17 at 12:59

The unfortunate truth is that many libraries get written, not designed. This is sad, because a bit of prior thought can prevent a lot of problems down the road.

If we set out to design a library, there will be some set of anticipated use cases. The library might not satisfy all use cases directly, but may serve as part of a solution. So the library needs to be flexible enough to adapt.

The constraint is that it's usually not a good idea to take the source code of the library and modify it to handle the new use case. For proprietary libraries the source may not be available, and for open source libraries it may be undesirable to maintain a forked version. It may not be feasible to merge highly specific adaptions into the upstream project.

This is where the open–closed principle comes in: the library should be open to extension without modifying the source code. That does not come naturally. This must be an intentional design goal. There is a wealth of techniques that can help here, the classic OOP design patterns are some of them. In general, we specify hooks where user code can safely plug in to the library and add functionality.

Just making every method public or allowing every class to be subclassed is not sufficient to achieve extensibility. First of all, it is really difficult to extend the library if it's not clear where user could can hook into the library. E.g. overriding most methods is not safe because the base class method was written with implicit assumptions. You really need to design for extensibility.

More importantly, once something is part of the public API you can't take it back. You can't refactor it without breaking downstream code. Premature openness limits the library to a suboptimal design. In contrast, making internal stuff private but adding hooks if later there is need for them is a safer approach. While that is a sane way to tackle the long-term evolution of a library, this is unsatisfactory for users who need to use the library right now.

So what happens instead? If there is significant pain with the current state of the library, the developers can take all the knowledge about actual use cases that accumulated over time, and write a Version 2 of the library. It will be great! It will fix all those by-design bugs! It will also take longer than expected, in many cases fizzling out. And if the new version is very dissimilar to the old version, it might be hard to encourage users to migrate. You're then left maintaining two incompatible versions.

score 8 · Answer 2 · answered Aug 23 '17 at 12:51

Every public and extensible class/method is a part of your API that must be supported. Limiting that set to a reasonable subset of the library allows the most stability and limits the number of things that can go wrong. It's a management decision (and even OSS projects are managed to a degree) based on what you can reasonably support.

The difference between OSS and closed source is that most people are trying to create and grow a community around the code so that it's more than one person maintaining the library. That said, there are a number of management tools available:

Mailing lists discuss user needs and how to implement things
Issue tracking systems (JIRA or Git issues, etc.) track bugs and feature requests
Version control manages the source code.

In mature projects, what you'll see is something along these lines:

Someone wants to do something with the library it wasn't originally designed to do
They add a ticket to the issue tracking
The team may discuss the issue in the mailing list or in the comments, and the requester is always invited to join the discussion
The API change is accepted and prioritized or rejected for some reason

At that point, if the change was accepted but the user wants to accelerate it getting fixed, they can do the work and submit either a pull request or a patch (depending on the version control tool).

No API is static. However it's growth has to be shaped in some way. By keeping everything closed down until there is a demonstrated need to open things up, you avoid getting the reputation of a buggy or unstable library.

Dmytro · Answer 3 · 2017-08-23T19:52:29.700

I'll reword my response since it seems it struck a nerve with a few people.

class property/method visibility has nothing to do with security nor openness of source.

The reason why visibility exists, is because objects are fragile to 4 specific problems:

concurrency

If you build your module unencapsulated, then your users will get used to altering module state directly. This works fine in a single threaded environment, but once you even think about adding threads; you will be forced to make the state private and use locks/monitors along with getters and setters that make other threads wait for the resources, rather than racing on them. This means your users programs won't work anymore because private variables cannot be accessed in a conventional way. This can mean you need a lot of rewrites.

The truth is that it's much easier to code with a single threaded runtime in mind, and private keyword allows you to simply add the keyword synchronized, or a few locks, and your users' code won't break if you encapsulated it from the beginning.

Help prevent users shooting themselves in the foot/streamline use of the interface. In essence, it helps you control the invariants of the object.

Every object has a bunch of things it requires to be true in order to be in consistent state. Unfortunately, these things live in client visible space because it's expensive to move each object into its own process and talk to it through messages. This means that it is very easy for an object to crash the whole program if the user has full visbility.

This is unavoidable, but you can prevent accidentally putting an object into inconsistent state by making an interface closure over its services that prevent accidental crashes by only allowing the user to interact with the object's state through a carefully crafted interface that makes the program much more robust. This doesn't mean the user can't intentionally corrupt the invariants, but if they do, it's their client that crashes, all they have to do is restart the program(the data you want to protect shouldn't be stored on client side).

Another nice example where you can improve usability of your modules is to make the constructor private; because if the constructor throws an exception, it will kill the program. One lazy approach to solving this is to make the constructor throw a compile time error you that you can't construct it unless it's in a try/catch block. By making the constructor private, and adding a public static create method, you can have the create method return null if it fails to construct it, or take a callback function to handle the error with, making the program more user friendly.

Scope pollution

Many classes have a lot of state and methods and it is easy to get overwhelmed trying to scroll through them; Many of these methods are just visual noise such as helper functions, state. making variables and methods private helps reduce scope pollution and make it easier for the user to find the services they are looking for.

In essence, it lets you get away with having helper functions inside the class rather than outside the class; without visibility control without distracting the user with a bunch of services that the user should never use, so you can get away with breaking down methods into a bunch of helper methods(although it will still pollute your scope, but not the user's).

being tied to dependencies

A well crafted interface can hide it's internal databases/windows/imaging that it depends on to do its work, and if you want to change to another database/another windowing system/another imaging library, you can keep the interface the same and the users won't notice.

On the other hand, if you don't do this, you can easily fall into making it impossible to change your dependencies, because they are exposed, and code relies on it. With a system big enough, the cost of migrating can become unaffordable, whereas a encapsulating it can protect well behaving client users from future decisions to swap out dependencies.

What is the common way to handle visibility in libraries?

3 Answers3