38

My school's CS program avoids any mention of object oriented programming, so I've been doing some reading on my own to supplement it -- specifically, Object Oriented Software Construction by Bertrand Meyer.

Meyer makes the point repeatedly that classes should hide as much information about their implementation as possible, which makes sense. In particular, he argues repeatedly that attributes (i.e., static, non-computed properties of classes) and routines (properties of classes that correspond to function/procedure calls) should be indistinguishable from each other.

For example, if a class Person has the attribute age, he asserts that it should be impossible to tell, from the notation, whether Person.age corresponds internally to something like return current_year - self.birth_date or simply return self.age, where self.age has been defined as a constant attribute. This makes sense to me. However, he goes on to claim the following:

The standard client documentation for a class, known as the short form of the class, will be devised so as not to reveal whether a given feature is an attribute or a function (in cases for which it could be either).

i.e., he claims that even the documentation for the class should avoid specifying whether or not a "getter" performs any computation.

This, I don't follow. Isn't the documentation the one place where it would be important to inform users of this distinction? If I were to design a database filled with Person objects, wouldn't it be important to know whether or not Person.age is an expensive call, so I could decide whether or not to implement some sort of cache for it? Have I misunderstood what he's saying, or is he just a particularly extreme example of OOP design philosophy?

gnat
  • 20,543
  • 29
  • 115
  • 306

12 Answers12

57

I don't think Meyer's point is that you shouldn't tell the user when you have an expensive operation. If your function is going to hit the database, or make a request to a webserver, and spend several hours computing, other code is going to need to know that.

But the coder using your class doesn't need to know whether you've implemented:

return currentAge;

or:

return getCurrentYear() - yearBorn;

The performance characteristics between those two approaches is so minimal it shouldn't matter. The coder using your class really shouldn't care which you have. That's meyer's point.

But that's not always the case, for example, suppose you have a size method on a container. That could be implemented:

return size;

or

return end_pointer - start_pointer;

or it could be:

count = 0
for(Node * node = firstNode; node; node = node->next)
{
    count++
}
return count

The difference between the first two really shouldn't matter. But the last one could have serious performance ramifications. That's why the STL, for example, says that .size() is O(1). It doesn't document exactly how the size is calculated, but it does give me the performance characteristics.

So: document performance issues. Don't document implementation details. I don't care how std::sort sorts my stuff, as long as it does so properly and efficiently. Your class also shouldn't document how it calculates things, but if something has an unexpected performance profile, document that.

Malachi
  • 608
Winston Ewert
  • 25,052
16

From an academic or CS purists view, it is of course a failure to describe in the documentation anything about the internals of the implementation of a feature. That's because the user of a class should ideally not make any assumptions about the internal implementation of the class. If the implementation changes, ideally no user won't notice that - the feature creates an abstraction and the internals should kept completely hidden.

However, most real-world programs suffer from Joel Spolsky`s "Law of leaky abstractions", which says

"All non-trivial abstractions, to some degree, are leaky."

That means, it is virtually impossible to create a full black-box abstraction of complex features. And a typical symptom of this are performance issues. So for real world programs, it may become very important which calls are expensive and which are not, and a good documentation should include that information (or it should say where the user of a class is allowed to make assumptions about performance, and where not).

So my advice is: include the information about potential expensive calls if you write docs for a real-world program, and exclude it for a program which you are writing only for educational purposes of your CS course, given that any performance considerations should be kept intentionally out of scope.

Doc Brown
  • 218,378
12

You can write if a given call is expensive or not. Better, use a naming convention like getAge for quick access and loadAge or fetchAge for expensive lookup. You definitely want to inform the user if the method is performing any IO.

Every detail you give in the documentation is like a contract which has to be honored by the class. It should inform about important behavior. Often, you will see complexities indication with big O notation. But you usually want to be short and to the point.

Simon Bergot
  • 8,020
9

If I were to design a database filled with Person objects, wouldn't it be important to know whether or not Person.age is an expensive call?

Yes.

This is why I sometimes use Find() functions to indicate that calling it may take awhile. This is more of a convention than anything else. The time it takes for a function or attribute to return makes no difference to the program (though it might to the user), although among programmers there is an expectation that, if it is declared as an attribute, the cost to call it should be low.

In any case, there should be enough information in the code itself to deduce whether something is a function or attribute, so I don't really see the need to say that in the documentation.

Robert Harvey
  • 200,592
3

It's important to note that the first edition of this book was written in 1988, in the early days of OOP. These people were working with more purely object oriented languages that are widely used today. Our most popular OO languages today - C++, C# & Java - have some pretty significant differences from the way that the early, more purely OO, languages worked.

In a language such as C++ & Java, you must distinguish between accessing an attribute and a method call. There's a world of difference between instance.getter_method and instance.getter_method(). One actually gets your value and the other does not.

When working with a more purely OO language, of the Smalltalk or Ruby persuasion (which it appears that the Eiffel language used in this book is), it becomes perfectly valid advice. These languages will implicitly call methods for you. There becomes no difference between instance.attribute and instance.getter_method.

I wouldn't sweat this point or take it too dogmatically. The intent is good - you don't want the users of your class to worry about irrelevant implementation details - but it doesn't translate cleanly to the syntax of many modern languages.

2

As a user, you don't need to know how something is implemented.

If performance is an issue, something has to be done inside the class implementation, not around it. Therefore, the correct action is to fix the class implementation or to file a bug to the maintainer.

mouviciel
  • 15,491
2

Any programmer-oriented piece of documentation which fails to inform programmers about the complexity cost of routines/methods is flawed.

  • We are looking to produce side effect-free methods.

  • If execution of a method has run time complexity and/or memory complexity other than O(1), in memory- or time-constrained environments it can be considered to have side effects.

  • The principle of least surprise is violated if a method does something completely unexpected - in this case, hogging memory or wasting CPU time.

1

I think you understood him correctly, but I also think you have a good point. if Person.age is implemented with an expensive calculation, then I think I'd like to see that in the documentation too. It could make the difference between calling it repeatedly (if it's an inexpensive operation) or calling it once and caching the value (if it expensive). I don't know for sure, but I think in this case Meyer might agree that a warning in the documentation should be included.

Another way to handle this might be to introduce a new attribute whose name implies that a lengthy calculation might take place (such as Person.ageCalculatedFromDB) and then have Person.age return a value that's cached within the class, but this may not always be appropriate, and seems to overcomplicate things, in my opinion.

0

Documentation for Object-oriented classes often involves tradeoff between giving the maintainers of the class flexibility to change its design, versus allowing consumers of the class to make full use of its potential. If an immutable class will have a number of properties which will have a certain exact relationship with each other (e.g. the Left, Right, and Width properties of an integer-coordinate grid-aligned rectangle), one might design the class to store any combination of two properties and calculate the third, or one might design it to store all three. If nothing about the interface makes clear which properties are stored, the programmer of the class may be able to change the design in the event that doing so would prove helpful for some reason. By contrast, if e.g. two of the properties are exposed as final fields and the third isn't, then future versions of the class will always have to use the same two properties as being the "basis".

If properties do not have an exact relationship (e.g. because they're float or double rather than int), then it may be necessary to document which properties "define" the value of a class. For example, even though Left plus Width is supposed to equal Right, floating-point math is often inexact. For example, suppose a Rectangle which uses type Float accepts Left and Width as constructor parameters is constructed with Left given as 1234567f and Width as 1.1f. The best float representation of the sum is 1234568.125 [which may display as 1234568.13]; the next smaller float would be 1234568.0. If the class actually stores Left and Width, it may report the width value as it was specified. If, however, the constructor computed Right based upon the passed-in Left and Width, and later computed Width based upon Left and Right, it would report the width as 1.25f rather than as the passed-in 1.1f.

With mutable classes, things can be even more interesting, since a change to one of the inter-related values will imply a change to at least one other, but it may not always be clear which one. In some cases, it may be best to avoid having methods which "set" a single property as such, but instead either have methods to e.g. SetLeftAndWidth or SetLeftAndRight, or else make clear what properties are being specified and which are changing (e.g. MoveRightEdgeToSetWidth, ChangeWidthToSetLeftEdge, or MoveShapeToSetRightEdge).

Sometimes it may be useful to have a class which keeps track of which properties' values have been specified and which have been computed from others. For example, a "moment in time" class might include an absolute time, a local time, and a time zone offset. As with many such types, given any two pieces of information, one may compute the third. Knowing which piece of information was computed, however, may sometimes be important. For example, suppose that an event is recorded as having occurred at "17:00 UTC, time zone -5, local time 12:00pm", and one later discovers that the time zone should have been -6. If one knows that the UTC was recorded off a server, the record should be corrected to "18:00 UTC, time zone -6, local time 12:00pm"; if someone keyed in the local time off a clock it should be "17:00 UTC, time zone -6, local time 11:00am". Without knowing whether the global or local time should be considered "more believable", however, it's not possible to know which correction should apply. If, however, the record kept track of which time was specified, changes to the time zone could leave that one alone while changing the other.

supercat
  • 8,629
0

All these rules about how to hide information in classes make perfect sense on the assumption of needing to protect against that someone among the users of the class who will make the mistake of creating a dependency on the internal implementation.

It's fine to build in such protection, if the class has such an audience. But when the user writes a call to a function in your class, they are trusting you with their execution-time bank account.

Here's the sort of thing I see a lot:

  1. Objects have a "modified" bit saying if they are, in some sense, out-of-date. Simple enough, but then they have subordinate objects, so it's straightforward to let "modified" be a function that sums over all the subordinate objects. Then if there are multiple layers of subordinate objects (sometimes sharing the same object more than once) simple "Get"s of the "modified" property can end up taking a healthy fraction of execution time.

  2. When an object is in some way modified, it is assumed that other objects scattered around the software need to be "notified". This can take place over multiple layers of data structure, windows, etc. written by different programmers, and sometimes repeating in infinite recursions that need to be guarded against. Even if all the writers of those notification handlers are reasonably careful not to waste time, the entire composite interaction can end up using an unpredicted and painfully large fraction of execution time, and the assumption that it is simply "necessary" is blithely made.

SO, I like to see classes that present a nice clean abstract interface to the outside world, but I do like to have some notion of how they work, if only to understand what work they are saving me. But beyond that, I tend to feel that "less is more". People are so enamored of data structure that they think more is better, and when I do performance tuning the universal massive reason for performance problems is slavish adherence to bloated data structures built the way people are taught.

So go figure.

Mike Dunlavey
  • 12,905
0

Adding implementation details like "calculate or not" or "performance info" make it more difficuilt to keep code and doc in sync.

Example:

If you have an "performance-expensive" method do you want to document "expensive" also to all classes that use the method? what if you change the implementation to be not expensive any more. Do you want to update this info to all consumers, too?

Of course it is nice for a code maintainer to get all important infos from the code documentation, but i donot like documentation that claims something that is not valid anymore (out of sync with the code)

k3b
  • 7,621
0

As the accepted answer comes to the conclusion:

So: document performance issues.

and self-documented code is considered to be better than documentation it follows that the method name should state any unusual performance results.

So still Person.age for return current_year - self.birth_date but if the method uses a loop to calculate the age (yes): Person.calculateAge()

Cwt
  • 111