How to make a large codebase easier to understand

Question

Suppose that I am developing a relatively large project. I have already documented all my classes and functions with Doxygen, however, I had an idea to put a "programmer's notes" on each source code file.

The idea behind this is to explain in layman's terms how a specific class works (and not only why as most comments do). In other words, to give fellow programmers an other-view of how a class works.

For example:

/*
 * PROGRAMMER'S NOTES:
 *
 * As stated in the documentation, the GamepadManager class 
 * reads joystick joystick input using SDL and 'parses' SDL events to
 * Qt signals.
 *
 * Most of the code here is about goofing around the joystick mappings.
 * We want to avoid having different joystick behaviours between
 * operating systems to have a more integrated user experience, since
 * we don't want team members to have a bad surprise while
 * driving their robots with different laptops.
 *
 * Unfortunately, we cannot use SDL's GamepadAPI because the robots
 * are interested in getting the button/axes numbers, not the "A" or
 * "X" button.
 *
 * To get around this issue, we created a INI file for the most common 
 * controllers that maps each joystick button/axis to the "standard" 
 * buttons and axes used by most teams. 
 *
 * We choose to use INI files because we can safely use QSettings
 * to read its values and we don't have to worry about having to use
 * third-party tools to read other formats.
 */

Would this be a good way to make a large project easier for new programmers/contributors to understand how it works? Aside from maintaining a consistent coding style and 'standard' directory organization, are there any 'standards' or recommendations for these cases?

Robert Harvey · Answer 1 · 2015-07-31T22:54:56.953

This is awesome. I wish more software developers took the time and effort to do this. It:

States in plain English what the class does (i.e. it's responsibility),
Provides useful supplementary information about the code without repeating verbatim what the code already says,
Outlines some of the design decisions and why they were made, and
Highlights some of the gotchas that might befall the next person reading your code.

Alas, many programmers fall into the camp of "if code is written properly, it shouldn't have to be documented." Not true. There are many implied relationships between code classes, methods, modules and other artifacts that are not obvious from just reading the code itself.

An experienced coder can carefully craft a design having clear, easily-understandable architecture that is obvious without documentation. But how many programs like that have you actually seen?

meriton · Answer 2 · 2015-08-02T02:44:38.083

The key to working with a large codebase is not having to read the entire codebase to make a change. To enable a programmer to quickly find the code he is looking for, code should be organized, and the organization apparent. That is, each logical unit in the code, from the executable, the library, the namespace, down to the individual class should have a clearly apparent responsibility. I would therefore not just document source files, but also the directories they reside in.

Your programmer's notes also give background on design decisions. While this can be valuable information, I would separate it from the statement of responsibility (to enable the reader to choose whether he wants to read about the responsibility of the class or its design rationale), and move it as close to the source it describes as possible, to maximize the chance the documentation is updated when the code is (documentation is only useful if we can trust in its accuracy - outdated documentation can be worse than none!).

That said, documentation should remain DRY, i.e. not repeat information that could have been expressed in code or was already described elsewhere (phrases like "as the documentation states" are a warning sign). In particular, future maintainers will be just a proficient in the project's programming language as they are in English; paraphrasing the implementation in comments (which I see altogether too often when people are proud of their documentation) has no benefit, and is likely to diverge from the implementation, in particular if the documentation is not near the code it describes.

Finally, the structure of documentation should be standardized across the project so everyone can find it (it's a royal mess of Peter documents in the bug tracker, Sue in the wiki, Alan in the readme, and John in the source code ...).

score 13 · Answer 3 · answered Aug 02 '15 at 09:55

I would not agree this is a very good approach, mainly due to

When you refactor your project, move methods around, the documentation breaks.
If documentation is not properly updated, it will result in more confusion than help understanding the code.

If you have unit tests for each method/ integration tests for each module, it would be a self documentation more maintainable and easier to understand compared to code comments.

Yes, having a proper directory structure definitely going to help.

score 8 · Answer 4 · answered Aug 01 '15 at 08:58

I'm personally a fan of a high-level design document - preferably written BEFORE any code - that gives an overview of the design and a list of classes and resources. A top-down design greatly simplifies things - yours might be "game engine -> hardware -> controllers -> joystick"; thus, a new programmer told "fix the 'a' button on the 'xyz controller" would at least know where to start looking.

Too many modern languages tend to break code into hundreds of tiny files, so just finding the correct file can be a challenge on even a moderate project.

Niall · Answer 5 · 2015-08-04T06:34:25.793

If the code base is large - I try to provide a design document that maps out the key elements of its design and the implementation. The intention here is not to detail any of the classes used, but provide a key to code and the thought that went into the design. It gives a overarching context to the system, its components and the application thereof.

Things to include in the design document are;

Application architecture
Logical code structure
Data flows
Key patterns used and the motivation behind their use
Code source structure
How to build it (this offers insight into implicit dependencies and physical code source structure)

Following on from this, documentation for the classes, and functions/methods should be completed as appropriate. In particular the public API; it should be clear what the following all are in each case;

Preconditions
Effects
Invariants
Exception conditions (throws)

score 4 · Answer 6 · answered Aug 02 '15 at 06:51

The most important rule I have found for making it easier for new developers to understand a codebase is perfect agreement is expensive.

If new developers must perfectly understand the system they are working on, it prevents all opportunities for on the job learning. I think the programmer's notes are an excellent start, but I would go further. Try to write code that, if approached anew, would allow a developer to figure out what they are doing on the fly, rather than requiring them to learn before they do. Little things like asserts for cases you know can never occur, along with comments explaining why the assert is valid, go a long way. So does writing code which fails gracefully rather than segfaulting if you do anything wrong.

score 3 · Answer 7 · answered Aug 02 '15 at 20:21

I have seen large classes with documentation, and after reading the documentation I haven't got a clue what this class is supposed to be good for and why anyone would use it! And at the same time, I had a need for some functionality and I was absolutely sure there must be a class to handle it, and couldn't find it anywhere - because there was no documentation that led me from what I needed to the class doing it.

So the first thing that I would want in the documentation is just a few sentences what a class does, and why I would want to use it. The comments in the original question are doing quite well in that respect. After reading these comments, if I had a joystick that doesn't work well because I can't interpret the values it delivers, I would know what code to check.

score 0 · Answer 8 · answered Aug 08 '15 at 14:39

Similar to what @meriton said, break the code down into separate components. Even better, break down the codebase into separate packages (JARs, gems, eggs, whatever) to make it even more clear how the components are separated. If there is a bug, a developer only needs to find the package where the bug is, and (hopefully) only fix it there. Not to mention, it's easier to do unit testing, and you get to take advantage of dependency management.

Another solution: make the codebase smaller. The less code there is, the easier it is to understand. Refactor out unused or duplicated code. Use declarative programming techniques. This takes effort, of course, and often isn't possible or practical. But it's a worthy goal. As Jeff Atwood has written: The Best Code Is No Code At All

score -1 · Answer 9 · answered Aug 02 '15 at 11:25

For complex systems it may be worth it to not just document each file, but their interactions and hierarchy, and how the program structures and why.

For example a game engine is usually quite complex and its hard to decide what calls what after hundred layers of abstraction. It may be worth creating a file like "Architecture.txt" to explain how and why is the code structured like this, and why there is that pointless looking abstraction layer there.

score -7 · Answer 10 · edited Aug 01 '15 at 10:47

-7

This can be partly because it is hard for a single programmer to write it, as each individual understands only their part of project.

Sometimes you can get this info from the notes of the project manager , but that is all you will get, as they will rarely rewrite their notes in this format.

edited Aug 01 '15 at 10:47

Michael Durrant

13,399

answered Aug 01 '15 at 05:17

Ilqar Rasulov

9

How to make a large codebase easier to understand

10 Answers10

Related