24

I am learning programming on my own following standard programming language textbooks. I come from a math background. I learned C on my own, but never really got to the more advanced portions.

I tried taking a coding course in college for CS wannabe majors who have no coding experiences. They keep talking about the need for commenting and documenting your code. I never really appreciate or understand why, since in math, we are taught to produce proofs on our own and read proofs written by others. If the proofs are written badly or unclear, we usually go ask someone or the proof's author for clarifications.

In the case of programming, shouldn't programmers be able to read and understand a program's code syntax even without any kind of handholding, such as commenting in any kind of written form?

I know I am asking a very naive question and many of you think I am speaking from zero experience. I accept such criticisms. When I was taking the programming course, there were never examples of how to write documentations. Like, how detailed does it have to be? Does there need to be an English explanation for every line of code? How do I know what is good documentation versus bad documentation? There were not really examples being shown to us and about how to write well documentable code comments.

5anya
  • 3
Seth
  • 381

11 Answers11

50

If the proofs are written badly or unclear, we usually go ask someone or the proof author for clarifications. in the case of programming, should programmers be able to read and understand a program's code syntax even without any kind of handholding in any kind of written form?

If I give you 10 or 100 pages of mathematical formulae without any explanation and I tell you it is a proof, can you then reliably tell me what it proves and if it is correct? Or would you rather have an accompanying text from the author explaining what they are trying to prove and what their main reasonings are in coming up with this proof?

Documentation in a software project is similar. Programmers are expected to know the syntax of the programming language well enough that you don't have to repeat the exact code again in English. Documentation should be on a higher level, explaining the things that are less obvious from the code itself, such as why a particular solution is preferred over another possible solution, or to point out the forest that hides itself amongst all the trees.

Part of good documentation in software is how you name your variables, classes and functions. Comments can easily go "out of sync" with the code they once described, this is less so for programming constructs. I know that mathematicians love single-letter variables, but programmers find that a good descriptive name is way better than having to look elsewhere (even if it is just a few lines up) to see what s means in this function. If useful, you can start out writing the code with short variables and rename them at the end (your development environment should support this).

TAR86
  • 105
26

There's two types of code documentation: documenting the interface, and documenting the code itself. One is for people who use your functions, one is for people who maintain your functions.

When documenting an interface of published functions, classes, and methods, you're...

  • explaining what it's for
  • explaining how to use it
  • making promises about its behavior
  • pointing out any caveats
  • saving the user a lot of time

That last one is about scale. While you're learning, you're probably working on maybe a few hundred lines of code. A typical project is tens of thousands or hundreds of thousands of lines which relies on millions and millions of lines of dependent code. Programmers have better things to do than read every line of code they rely on, and it's simply impossible now.

For example, here's my documentation for a function which centers strings. It is written in perldoc, Perl's standard documentation language. It can then be rendered in a variety of formats, here it is in HTML. Most modern languages have a standard way of embedding documentation in the code (C does not, it is 50 years old).

=head3 center
my $centered_string = $string->center($length);
my $centered_string = $string->center($length, $character);

Centers $string between $character. $centered_string will be of length $length.

C<$character> defaults to " ".

say &quot;Hello&quot;-&gt;center(10);        # &quot;   Hello  &quot;;
say &quot;Hello&quot;-&gt;center(10, '-');   # &quot;---Hello--&quot;;

C<center()> will never truncate C<$string>. If $length is less than C<< $string->length >> it will just return C<$string>.

say &quot;Hello&quot;-&gt;center(4);        # &quot;Hello&quot;;

It contains examples of use, an explanation of what the function does, and a caveat about what happens if the length is shorter than the string.

Code which calls center will be relying on it behaving consistently. Without the documentation, they'd have to guess what it did based on reading the code. This can lead to misunderstanding its purpose, and it can also lead to relying on implementation details which may change in the future. By writing documentation you make a contract with the user that this is how you promise the function will always behave this way, and the user will only rely on the documented features. With this contract in hand, you are free to change the implementation without worrying that you're going to break someone else's code ("backwards compatibility"), and the user is free to use the function without worrying center will change and break their code.

Ideally, the users of your functions never have to look at the code at all; the documentation explains everything.


When documenting the code itself, you generally don't document what the code is doing, but why it is doing it. The why is not always obvious; that knowledge only lives in somebody's head (and maybe a commit log message). You cannot rely on being able to speak with the author, they might not be available, and that does not scale.

For example, the comment here explains the bulk of this function is working around a bug in another library. And it's also faster.

sub _get_datetime_timezone {
    state $local_tzfile = "/etc/localtime";
# Always be sure to honor the TZ environment var
return &quot;local&quot; if $ENV{TZ};

# Work around a bug in DateTime::TimeZone on FreeBSD where it
# can't determine the time zone if /etc/localtime is not a link.
# Tzfile is also faster to do localtime calculations.
if( -e $local_tzfile ) {
    # Could go through more effort to figure it out.  Meh.
    my $tzname = &quot;Local&quot;;
    if( -l $local_tzfile ) {
        if( my $real_tzfile = eval { readlink $local_tzfile } ) {
            $tzname = $real_tzfile;
        }
    }
    require DateTime::TimeZone::Tzfile;
    my $tz = DateTime::TimeZone::Tzfile-&gt;new(
        name     =&gt; $tzname,
        filename =&gt; $local_tzfile
    );
    return $tz if $tz;
}

return &quot;local&quot;;

}

Without that comment, every reader would have to puzzle it out for themselves wasting time, maybe getting it wrong. Maybe even concluding the code isn't necessary and deleting it reopening an old bug.


Sometimes you do comment on what the code is doing if it is not immediately obvious. For example... (this is Perl).

    time => sub {
        my ($class, $caller) = @_;
    require perl5i::2::DateTime;

    # Export our gmtime() and localtime() and time()
    (\&amp;perl5i::2::DateTime::dt_gmtime)-&gt;alias($caller, 'gmtime');
    (\&amp;perl5i::2::DateTime::dt_localtime)-&gt;alias($caller, 'localtime');
    (\&amp;perl5i::2::DateTime::dt_time)-&gt;alias($caller, 'time');
},

(\&perl5i::2::DateTime::dt_gmtime)->alias($caller, 'gmtime'); is a mouthful, even for Perl. The comment explains what this code is doing.

However, comments have a way of falling out of date and becoming wrong. It's better to rework the code so it can be plainly understood. Perhaps by writing a wrapper function that reads more clear.

    time => sub {
        my ($class, $caller) = @_;
    require perl5i::2::DateTime;

    alias(\&amp;perl5i::2::DateTime::dt_gmtime, as =&gt; 'gmtime', in =&gt; $caller);
    alias(\&amp;perl5i::2::DateTime::dt_localtime, as =&gt; 'localtime', in =&gt; $caller);
    alias(\&amp;perl5i::2::DateTime::dt_time, as =&gt; 'time', in =&gt; $caller);
},

To a Perl programmer that means it's making the function perl5i::2::DateTime::dt_time available as the function time in the namespace $caller. And they can look up the details in the documentation for alias.

Schwern
  • 1,192
23

A lot of your question is more of a rant than an actual question but I am somewhat sympathetic. In my decades of experience with development, I have found documentation that is useful but I've also seen a lot that was useless at best and often misleading.

If the proofs are written badly or unclear, we usually go ask someone or the proof author for clarifications.

What if they are not available? Much of my coding career has been spent working with code written by people who moved on years ago. Even if they are around, they might not remember. A standard programmer joke/experience is "Who wrote this garbage? [git blame], Oh, right. It was me."

Does there need to be an English explanation for every line of code etc.

No. This is stupid. If someone tells you this, you can safely ignore their advice on coding. This might make some sense if you were coding in assembly or some other really low-level language but there's no real good reason to do this in real production code using any popular contemporary language. I do see this in things like tutorials where the main point is to teach. This might give some people the impression that it's what they should always be doing, perhaps.

I actually think this is very bad practice. Commenting every bit of code is not only costly but such comments tend to be misleading if not outright wrong. They clutter up the code with a bunch of noise IMO. I tend to ignore trivial comments and delete them when I can.

How do I know what is good documentation versus bad documentation, etc.

To start, understand that there are many types of documentation for systems. The primary distinction I would make is that there are comments or inline documentation and there are separate documents which explain various things about a system including (but not limited to) data models, networking diagrams, tutorials, design descriptions, change notes, API specs, and requirements. I agree with the view that unit tests are a form of documentation. Developers are usually mostly responsible for inline documentation and unit tests but may be involved in some of the other forms of documentation. A person who spends a good part of their time creating designs is typically referred to as an 'architect'.

Good documentation explains non-obvious things about a system. Usually, it's at a fairly high-level. For example, it can be very hard, if not impossible to understand how various subsystems work together just from looking at code and configuration artifacts. It's often helpful to have comments at the class/module/namespace level which briefly explain their purpose and usage.

APIs or libraries that are intended for use by multiple applications should have extensive documentation. I shouldn't have to read the code to figure out how Python's re module interprets patterns, for example. This is also important for calling out what behaviors are intended and will be supported in updated versions. For example, if you notice that the order of your inputs to some function is retained in outputs somewhere else, you should be careful about depending on that behavior if there's nothing stated about it in the documentation. If you are writing such a library, you should document what you intend to support over the long-term.

There are two kinds of things that are often referred to a comments because they are written into the code but I would argue they are significantly different. There are comments placed near method declarations, classes, etc. which are often recognized by various tools. These, in my mind, should be classified as 'documentation'. If you are working in VSCode and hover over a method call, you will often see a popup with some text. That generally comes from these kinds of comments. They can be very useful as long as they explain things like what the method does, how the parameters are treated, and special cases. An example of a useless way of doing this is the classic JavaDoc: getFoo: gets the foo pattern. There's no good reason to simply restate the obvious in these types of comments.

The other kind of inline comments are written alongside the executable statements of code are meant for someone who is reading the code in detail. IMO Comments at this level should be rare and reserved for special scenarios. Sometimes I've run into strange things in APIs where you need to call some seemingly unrelated method first to get it to work. Someone (including myself) might come along later and think it's a mistake or some sort of cruft and remove it. In that case I usually put a warning about how it needs to be there and often a link to something explaining why. If the code is very hard to understand, that can be a reason as well, but refactoring is a better solution.

JimmyJames supports Canada
  • 30,578
  • 3
  • 59
  • 108
8

The proof is in the pudding

Why are you asking this? Can't you figure it out for yourself?

Mathematics as an example

We can observe mathematical papers as an example. Their content heavily skews towards a written explanation rather than a "result dump" of the conclusion, specifically because it would otherwise require readers to redo the legwork that the paper should be proving has already been done.

I could rephrase your question about mathematics and ask why we bother teaching anything other than 1+1=2, can't these children think for themselves and figure it out? The rest is just working out, isn't it?

What are code comments?

It seems you've fallen in a very common trap. Not every comment is as useful and relevant. An example of a bad code comment would be:

// Adds one and returns the value
public int AddOneToInput(int input)
{
    // Add 1 to the input value
    var result = input + 1;
// Return the result
return result;

}

These are bad comments, because they explain something that was already trivially obvious from the code itself. Return the result is not meaningfully more informative than return result; already was.

Newcomers to development often write trivial comments. Partly, it's understandable. They aren't intuitively aware of simple syntax yet, so they write down their thoughts in a comment, and then they translate these lines to actual code.
That's perfectly okay, we all have to learn and get familiar with how to express ourselves in a new programming language, but the comments should not be kept around afterwards - definitely not in a professional context where it is assumed that readers understand the programming language already.

In this sense, yes you are correct that we shouldn't be writing these kinds of handholding comments, because we wouldn't do that in mathematics either. But the comparison to mathematics is unnecessary because it distracts you with ways in which the analogy fails, even though the specific thing that you wanted to point out isn't as incorrect.

However, let's consider what a good comment would be. First, without comment:

public bool IsEven(int n)
{
    return (n ^ 1) == (n + 1);
}

The method name tells you that we're checking if the number is even or odd, but have you understood how we are doing so?

public bool IsEven(int n)
{
    // Performs an XOR operation on the rightmost digit.
    // If the number increased, the rightmost digit was
    // initially 0 and therefore the number was even.
    return (n ^ 1) == (n + 1);
}

It's easy to pick at this example and tell me that I could've done n % 2 ==0, which required no real comment. I know that.

The goal here was to give you a fixed piece of logic that is not trivial to understand, thus showcasing how a comment can help make a difficult piece of logic more digestible (as opposed to the previous example where the comment added nothing of value).

The problem with showcasing the benefit of having comments is that comments become more necessary when the complexity increases, but the more complex an example is, the more time it takes me to explain to you all the intricacies that justify making this example so complex in the first place.

If you won't take my word for it and want to see a real complex situation where comments would've been helpful, start working in a professional environment and needing to deal with deadline and code written by others where readability by other developers is low on the list of priorities.

I'm sure you can approximate this by browsing GitHub (or similar sites) and trying to read codebases that tackle non-trivial problems.

Flater
  • 58,824
5

Here are two corkscrews. One comes with documentation.

Sure you could disassemble the complex one, or figure it out by trial and error. But you aren't trying to understand the corkscrew. You are trying to open more wine for your project manager.

enter image description here

enter image description here

Ewan
  • 83,178
4

You seem to not appreciate the sheer size of software development compared with maths. Take the proof for Fermat's last theorem, maybe hundred pages. Take the kind of software I’m working on, easily a million lines of code, 20,000 pages, and there is plenty stuff of that size around. How many proofs have you worked through with even 200 lines?

The sheer amount means you have not a chance to read the code for even a medium sized project and understand it all. That’s why you need documentation, to give anyone a chance to know what the code does, within their life time.

gnasher729
  • 49,096
3

The need for documentation starts becoming obvious once you start producing software that's tens of thousands of lines of code, or more, and a team of people working on it.

Code should be well laid out, split into meaningful modules, with appropriate names for things, and comments where useful.

But once a project gets to a certain size, it's very difficult for a newcomer to work out even where to start. At that point a good software design document is a major help.

Once a project gets to a sufficient size and level of formality, you could end up with a library of documents. For example: software requirements, software coding standards, software design, software test plan, software test procedure, software build document, and so on.

Simon B
  • 9,772
2

One core tenet about comments:

If you can express it in code instead of in a comment, do that.

This aligns with the CppCoreGuidelines.

Useful ways to make comments unnecessary:

  • Proper and informative naming of identifiers for classes, functions, variables etc. This cannot be overestimated.
  • Custom types instead of generic types. Examples are Point instead of an array with two or three elements, Foot and Meter instead of float for both (plus it keeps your space probes from crashing (see below)), enums instead of bools. For the latter, compare the signatures of SomeFunction(bool, bool) with SomeFunction(LogRequested_E, ExitOnError_E) with the enums having members like { DONT_LOG, LOG } and { DONT_EXIT, EXIT }. If your coding rules allow that, such parameters can actually be used as bools in if (logRequested) so that the using code stays concise.
  • Establishing invariants in the constructor. Stroustrup mentions that a comment like // Always call init() before first use!! has become obsolete in C++.
  • Avoiding code that is gratuitously hard to understand. If code gets optimized later and therefore harder to understand, leave the original, slow and straight-forward code as a reference in a comment.

There are a number of reasons for that:

  • Comments aim at repeating information already present in code, creating redundancy. This implies that they can be or become wrong. Code maintenance must address the comments as well, creating more work.
  • Custom types and initialization make correctness automatic, or at least enforced by the compiler. For example, directly assigning feet to meters, as in the infamous Mars probe, becomes impossible with custom types.
  • Comments can never convey all information present in the code. The source code, when available, is the ultimate reference, which makes it important that it is intelligible. I have heard that TCP/IP became a success partly because BSD 4.2 had a working implementation whose source code could be inspected (even if it was not in the public domain at the time). But you could see how the system actually worked without re-engineering it.

By implication, this guideline also tells you that you should comment at what cannot be expressed in code.

For me, that's often

  • where a function used (although modern IDEs help finding usages, and as all comments these can age and become wrong as well);
  • non-obvious interactions with other code, if present;
  • non-obvious and unavoidable restrictions for parameters, when it can be called etc.
  • why a certain approach was chosen over another, any research which informed the decisions (benchmarking, papers);
  • what a maintainer should pay attention to later, if anything;
  • for longer functions or modules: An introduction into the design and the strategies chosen which help a maintainer to get an overall understanding of and mental framework for dealing with your code. While this information is in principle ideally expressed in the code (by proper modularization, naming, clear coding etc.) it is hard to extract from it by a newcomer.
1

You should add comments which will be useful to the readers. The difficult part is defining "useful", and, to some extent, "readers".

Defining the audience can help work out what's useful. First, the readers always include you yourself. Make sure comments are useful for your future self, and you're probably 50% of the way there. Document (not necessarily in the code, but also in the project readme, issue tracker, or other knowledge repositories) what you think you will need to know when re-reading the code a year from now. Then, is it going to be read by other people you know? What do you think will be useful to them? And, finally, is it going to be read by complete strangers, possibly from other cultural backgrounds, fields, etc? This last one is really important. You need to be clear, factual, and concise, or your writing is guaranteed to be misunderstood. And a misunderstood comment is worse than no comment, because readers will have to eventually notice their confusion, correct for it, and by then will be annoyed at having lost time and effort for no gain.

l0b0
  • 11,547
1

Things that are obvious do not need a comment.

However it's not always clear what's obvious: For example for someone reading C language code without knowing the C language, almost nothing is obvious.

Unfortunate there are sometimes stupid "company coding guidelines", so you might find code like

x = x + 1; /* add one to variable x */

(maybe also known as the COBOL mistake)

In contrast the line if (!((ch+1) & ch) || !*s) would benefit from some comment as only experienced programmers do recognize the pattern immediately.

IMHO it does not make any sense to add comments into the code, enabling (or trying to enable) people who do not understand the language (the basics at least) to understand the code. Maybe just add a high-level comment for each routine (following the classic top-down design principle), explaining what it does at an abstract level (as the details should be "obvious" from the actual code). It's named procedural abstraction.

Probably it's preferable to make the code readable (for humans, not for the compiler only) instead of commenting unreadable code. One of my instructors once said (AFAIR): "If you have a line of code you are especially proud of, that line will cause most trouble in the long run; re-write it to make it simple and understandable."

(Today you don't have to write ugly code for performance reasons as most compilers can do that four you.)

Also, some company guidelines require every function to have a comment explaining who changed that function last, and when he/she did that (possibly with a log of changes at the function level). I think with current source code management systems such comments are out of time, specifically when considering that the comments might exceed the actual code in size, effectively making it harder to read as a whole.

Maybe consider the drastic approach an exercise in corporate downsizing found in the Eiffel Style Guidelines by Betrand Meyer.

The other question is: What is the purpose of the comments?

  • Comments can help the original author to understand his/her own code when revising it after a long time
  • Comments can help programmers having to understand or update foreign code
  • If people different from programmers have to write program or user documentation, they might base their work on what the programmers "had left" for them.
  • Comments add details that cannot seen from the code directly (as algorithms being implemented, or references to literature). This could be important, specifically as there have been books with incorrect algorithms.
  • Some languages also know about formal comments that are part of the language specification. Typically the compiler or other development tools handle such comments in a special way, like generating interface descriptions from the code automatically.
U. Windl
  • 147
0

Schwern’s answer is quite good, explaining that there are 2 kinds of comments/ documentation, but leaves a bit out.

API/interface documentation is formal, done in a prescribed manner and is generally concerned with letting the reader understand how to interact with the API. This is generally not intended for someone implementing the API, but for someone utilizing it. In fact it is quite common for an API to be written/implemented in an entirely different language from the language where it is most commonly used. It is distinct from and generally accessed entirely without reference to the implementing code. Code documentation is read whenever one wants to learn if and how an api can be leveraged to do a task.

Code comments although frequently discussed as “documenting intent” are NOT formal, are never stored or read separately from the code. Good code comments make the code easier to read by providing context for what it does and why it does it in a particular way. It’s possible that some syntax may need to be explained, but that is extremely rare. I recently had a bug-fix that could have been done by deleting one line of code, but which (IMO) was best done by making changes in 7-8 files, adding comments and renaming variables and functions so that my bug would not be reimplemented. I made those changes, so that anyone reading the code and comments would understand how things worked and would thus never even think to add that line of code. And if someone else (or future me) need to make some changes, they will understand the process.

As for how to write either kind of documentation…

Formal documentation as for an api, will have a defined format, choosing or being required to do it will lead you the format. Mainly what people will be looking for isn’t so much “good” as “through” and understandable. designing the API will be harder than documenting it. It’s relatively easy to do, just cover everything that is publicly shared about how it works.

Informal documentation aka code comments, is another matter entirely, large swaths of code need nothing more than good naming conventions, and standard patterns. The hard part is stepping back from what you know right now, and envisioning what you might need six months from now if you made some kind of mistake or need to tweak how things work. Frequently you won’t find out what comment you need to make, until you do come back to the code and need to understand it or figure out how to change it.

jmoreno
  • 11,238