Is information hiding more than a convention?

Question

In Java, C# and many other strongly-typed, statically checked languages, we are used to write code like this:

public void m1() { ... }
protected void m2() { ... }
private void m2() { ... }
void m2() { ... }

Some dynamically checked languages don't provide keywords to express the level of "privateness" of a given class member and rely on coding conventions instead. Python for example prefixes private members with an underscore:

_m(self): pass

It can be argued that providing such keywords in dynamically checked languages would add little use since it is only checked at runtime.

However, I can't find a good reason to provide these keywords in statically checked languages, either. I find the requirement to fill my code with rather verbose keywords like protected both annoying and distracting. So far, I have not been in a situation where a compiler error caused by these keywords would have saved me from a bug. Quite in contrary, I have been in situations where a mistakenly placed protected prevented me from using a library.

With this in mind, my question is:

Is information hiding more than a convention between programmers used to define what is part of the official interface of a class?

Can it be used to secure a class' secret state from being attacked? Can reflection override this mechanism? What would make it worthwhile for the compiler to enforce information hiding?

Joris Timmermans · Answer 1 · 2012-05-14T07:36:24.253

The "private" access specifier is not about the compiler error it generates the first time you see it. In reality it's about preventing you from accessing something that is still subject to change when the implementation of the class holding the private member changes.

In other words, not allowing you to use it when it's still working prevents you from accidentally still using it when it's no longer working.

As Delnan remarked below the prefix convention discourages accidental use of members that are subject to change as long as the convention is followed and understood correctly. For a malicious (or ignorant) user it does nothing to stop them from accessing that member with all the possible consequences. In languages with built-in support for access specifiers this does not happen in ignorance (compiler error), and stands out like a sore thumb when malicious (strange constructions to get to the private member).

The "protected" access specifier is a different story - don't think of this as simply "not quite public" or "a lot like private". "Protected" means that you will probably want to use that functionality when you derive from the class containing the protected member. The protected members are part of the "extension interface" that you will use to add functionality on top of existing classes without changing those existing classes themselves.

So, short recap:

public: Use safely on instances of the class, the purpose of the class, will not change.
protected: To be used when extending (deriving from) the class - may change if the implementation has to change drastically.
private: Do not touch! May change at will to provide a better implementation of the expected interfaces.

score 11 · Answer 2 · answered Aug 05 '11 at 13:37

If you are writing code that will be consumed by someone else, then information hiding can provide a much easier to undertand interface. "Someone else" could be another developer on your team, developers consuming an API you wrote commercially, or even your future self who "just can't remember how the dang thing works". It is way easier to work with a class that has only 4 methods available than one that has 40.

score 4 · Accepted Answer · edited May 23 '17 at 12:40

I'm studying for the Java certification and a whole bunch of it concerns that, how to manage access modifiers. And they make sense and they should be used properly.

I've worked with python and, during my learning journey, I've heard that in python, the convention is there because people, who work with it, should know what it means and how to apply it. That said, in _m(self): pass the underscore would alert me to not mess around with that field. But will everybody follow that convention? I work with javascript and I must say, they don't. Sometimes I need to check a problem and the reason was that the person was doing something that he wasn't supposed to do...

read this discussion regarding the python leading underscore

Is information hiding more than a convention between programmers used to define what is part of the official interface of a class?

From what I said, Yes.

Can it be used to secure a class' secret state from being attacked? Can reflection override this mechanism? Are there any other advantages provided by a more formal mechanism enforced by the compiler?

Yes it can be used to secure a class' secret state and it should be not only used for that but to prevent users from messing with the objects state in order to change their behaviors to something that they think should be the way the object should have been working. You, as a developer, should plan and think about it and design your class in a way that makes sense, in a way that its behavior won't be tampered. And the compiler, as a good friend, will help you keeping the code the way you designed it enforcing the access policies.

EDITED Yes, reflection can, check the comments

The last question is an interesting one and I'm willing to read the answers concerning it.

score 4 · Answer 4 · answered Aug 05 '11 at 14:37

I would suggest that for background, you start by reading about class invariants.

An invariant is, to make a long story short, an assumption about the state of a class which is supposed to remain true throughout the lifetime of a class.

Let's use a really simple C# example:

public class EmailAlert
{
    private readonly List<string> addresses = new List<string>();

    public void AddRecipient(string address)
    {
        if (!string.IsNullOrEmpty(address))
            addresses.Add(address);
    }

    public void Send(string message)
    {
        foreach (string address in addresses)
            SendTo(address, message);
    }

    // Details of SendTo not shown
}

What's going on here?

The member addresses is initialized on class construction.
It's private, so nothing from the outside can touch it.
We've also made it readonly, so nothing from the inside can touch it after construction (this isn't always correct/necessary, but it's useful here).
Therefore, the Send method can make an assumption that addresses is never going to be null. It doesn't have to perform that check because there is no way that the value can be changed.

If other classes were allowed to write to the addresses field (i.e. if it were public), then this assumption would no longer be valid. Every single other method in the class that depends on that field would have to start doing explicit null checks, or risk crashing the program.

So yes, it's a lot more than a "convention"; all of the access modifiers on class members collectively form a set of assumptions about when and how that state can be changed. Those assumptions are subsequently incorporated into dependent and interdependent members and classes so that programmers don't have to reason about the entire state of the program at the same time. The ability to make assumptions is a critical element of managing complexity in software.

To these other questions:

Can it be used to secure a class' secret state from being attacked?

Yes and no. Security code, like most code, is going to rely on certain invariants. Access modifiers are certainly helpful as signposts to trusted callers that they aren't supposed to mess with it. Malicious code isn't going to care, but malicious code doesn't have to go through the compiler either.

Can reflection override this mechanism?

Of course it can. But reflection requires the calling code to have that privilege/trust level. If you're running malicious code with full trust and/or administrative privileges, then you've already lost that battle.

What would make it worthwhile for the compiler to enforce information hiding?

The compiler already does enforce it. So does the runtime in .NET, Java, and other such environments - the opcode used to call a method will not succeed if the method is private. The only ways around that restriction require trusted/elevated code, and elevated code could always just write directly to the program's memory. It's enforced as much as it can be enforced without requiring a custom operating system.

Fuhrmanator · Answer 5 · 2012-08-27T17:24:55.630

Information hiding evolved from a top-down design philosophy. Python has been called a bottom-up language.

Information hiding is enforced well at the class level in Java, C++ and C#, so it's not really a convention at this level. It's very easy to make a class a "black box", with public interfaces and hidden (private) details.

As you pointed out, in Python it's up to programmers to follow the convention of not using what is intended to be hidden, since everything is visible.

Even with Java, C++ or C#, at some point information hiding becomes a convention. There are no access controls at the highest levels of abstraction involved in more complex software architectures. For example, in Java you can find the use of ".internal." package names. This is purely a naming convention because it's not easy for this kind of information hiding to be enforced via package accessibility alone.

One language that strives to define access formally is Eiffel. This article points out some other information-hiding weaknesses of languages such as Java.

Background: Information hiding was proposed in 1971 by David Parnas. He points out in that article that use of information about other modules can "disastrously increase the connectivity of the system structure." According to this idea, lack of information hiding can lead to tightly coupled systems that are hard to maintain. He continues with:

We wish to have the structure of the system determined by the designers explicitly before programming begins, rather than inadvertently by a programmer's use of information.

score 3 · Answer 6 · answered Aug 05 '11 at 14:15

Information hiding is much more than just a convention; trying to go around it can actually break the functionality of the class in many cases. For instance, it's pretty common practice to store a value in a private variable, expose it using a protected or public property of the same time, and in the getter, check for null and do any initialization that's necessary (i.e., lazy loading). Or store something in a private variable, expose it using a property, and in the setter, check to see if the value changed and fire PropertyChanging/PropertyChanged events. But without seeing the internal implementation, you'd never know all that's going on behind the scenes.

score 2 · Answer 7 · answered Aug 26 '12 at 23:16

I have not been in a situation where a compiler error caused by these keywords would have saved me from a bug.

It's not so much to save application authors from bugs as to let library authors decide which parts of their implementation they're committing to maintaining.

If I have a library

class C {
  public void foo() { ... }

  private void fooHelper() { /* lots of complex code */ }
}

I may want to be able to replace foo's implementation and possibly change fooHelper in radical ways. If a bunch of people have decided to use fooHelper despite all the warnings in my documentation then I may not be able to do that.

private lets library authors break libraries into manageable sized methods (and private helper classes) without the fear that they'll be forced to maintain those internal details for years.

What would make it worthwhile for the compiler to enforce information hiding?

On a side note, in Java private is not enforced by the compiler, but by the Java bytecode verifier.

Can reflection override this mechanism? What would make it worthwhile for the compiler to enforce information hiding?

In Java, not only reflection can override this mechanism. There are two kinds of private in Java. The kind of private that prevents one outer class from accessing another outer class's private members which is checked by the bytecode verifier, but also privates that are used by an inner class via a package-private synthetic accessor method as in

 public class C {
   private int i = 42;

   public class B {
     public void incr() { ++i; }
   }
 }

Since class B (really named C$B) uses i, the compiler creates a synthetic accessor method that allows B to access C.i in a way that gets past the bytecode verifier. Unfortunately, since ClassLoader allows you to create a class from a byte[], it's fairly simple to get at the privates that C has exposed to inner classes by creating a new class in C's package which is possible if C's jar has not been sealed.

Proper private enforcement requires coordination between the classloaders, bytecode verifier, and the security policy which can prevent reflective access to privates.

Can it be used to secure a class' secret state from being attacked?

Yes. "Secure decomposition" is possible when programmers can collaborate while each preserving the security properties of their modules -- I don't have to trust the author of another code module not to violate the security properties of my module.

Object Capabilities languages like Joe-E use information hiding and other means to make secure decomposition possible:

Joe-E is a subset of the Java programming language designed to support secure programming according to object-capability discipline. Joe-E is intended to facilitate construction of secure systems, as well as to facilitate security reviews of systems built in Joe-E.

The paper linked from that page gives an example of how private enforcement makes secure decomposition possible.

Providing secure encapsulation.

Figure 1. An append-only logging facility.
public final class Log {
  private final StringBuilder content;
  public Log() {
    content = new StringBuilder();
  }
  public void write(String s) {
    content.append(s);
  }
}
Consider Fig. 1, which illustrates how one might build an append-only log facility. Provided that the rest of the program is written in Joe-E, a code reviewer can be conﬁdent that log entries can only be added, and cannot be modiﬁed or removed. This review is practical because it requires only inspection of the Log class, and does not require review of any other code. Consequently, verifying this property requires only local reasoning about the logging code.

score 2 · Answer 8 · answered Aug 05 '11 at 13:56

Ruby and PHP have it and enforce it at runtime.

The point of information hiding is actually information showing. By "hiding" internal details, the purpose becomes apparent from an outside perspective. There are languages, which embrace this. In Java, access defaults to package internal, in haXe to protected. You explicitly declare them public to expose them.

The whole point of this is to make your classes easy to use by exposing only a highly coherent interface. You want the rest to be protected, so that no clever guy comes and messes with your internal state to trick your class into doing what he wants.

Also, when access modifiers are enforced at runtime, you can use them to enforce a certain level of security, but I don't think this is a particularly good solution.

score 2 · Answer 9 · edited Aug 05 '11 at 15:00

Acces modifiers can definitely do something that convention can't.

For example, in Java you cannot access private member/field unless you use reflection.

Therefore if I write the interface for a plugin and correctly deny rights to modify private fields through reflection (and to set security manager :) ), I can send some object to functions implemented by anyone and know that he cannot access its private fields.

Of course there can be some security bugs that would allow him to overcome this, but this is not philosophically important (but in practice it definitely is).

If the user runs my interface in his environment, he has control and thus can circumvent access modifiers.

score 1 · Answer 10 · answered Aug 05 '11 at 17:38

Information hiding is one of the primary concerns of good software design. Check out any of Dave Parnas' papers from the late '70s. Basically, if you can't guarantee that your module's internal state is consistent, you can't guarantee anything about its behavior. And the only way you can guarantee its internal state is to keep it private and only allow it to be changed by means you provide.

score 1 · Answer 11 · answered Aug 05 '11 at 23:49

By making protection a part of the language, you gain something: reasonable assurance.

If I make a variable private, I have reasonable assurance that it will be touched only by code within that class or explicitly declared friends of that class. The scope of code that could reasonably touch that value is limited and explicitly defined.

Now are there ways to get around syntactic protection? Absolutely; most languages have them. In C++, you can always cast the class to some other type and poke at its bits. In Java and C#, you can reflect yourself into it. And so forth.

However, doing this is hard. It's obvious that you're doing something you ought not be doing. You cannot do it by accident (outside of wild writes in C++). You must willingly think, "I'm going to touch something that I was told not to by my compiler." You must willingly do something unreasonable.

Without syntactic protection, a programmer can just accidentally screw things up. You have to teach the user a convention, and they must follow that convention every time. If they don't, the world becomes very unsafe.

Without syntactic protection, the onus is on the wrong people: the many people using the class. They must follow the convention or unspecified badness will occur. Scratch that: unspecified badness may occur.

There is nothing worse than an API where, if you do the wrong thing, everything might work anyway. That provides false reassurance to the user that they have done the right thing, that everything is fine, etc.

And what of new users to that language? Not only do they have to learn and follow actual syntax (enforced by the compiler), they now must follow this convention (enforced by their peers at best). And if they don't, then nothing bad may happen. What happens if a programmer doesn't understand why the convention exists? What happens when he says "screw it" and just pokes at your privates? And what does he think if everything keeps working?

He thinks that the convention is stupid. And he won't follow it ever again. And he will tell all his friends not to bother too.

Is information hiding more than a convention?

11 Answers11

Providing secure encapsulation.