How to analyze and understand the use/application of a "class" in a colossal million-line legacy code base?

Question

I am working on a huge code base (more than a million lines of code with a sophisticated architecture) written in C++ over the span of a couple of decades. The task on which I'm working at this point is understanding the use of a specific class whose functionality is unknown to almost every developer of the team. The reason? Because as I mentioned the code has been in development for decades and it's been through major changes, upgrades etc. etc. so you can imagine it may get messy when you have a million lines of code being developed by hundreds of developers.

I need to analyse and understand the structure and utility of a file called CLASS_inc.hxx.

Here are the details of my challenge:

A class called A_CLASS is declared in the header file CLASS_inc.hxx with all it's member functions. The members of this class are called in a couple of different parts of the code using scope resolution CLASS::member_function (well it's more complicated than that but I'm simplifying, you can also see a simplified snippet of the code down below). I could understand that some of the member functions are completely useless, I simply ran the command grep -rwin member_function in src of the code which returned no trace of the memeber_function anywhere in the code, because it is simply declared but never called in any corner of the code. So I deleted these useless member_functions compiled my code and ran the Test_Cases (there is a huge test base in the code) and all tests passed without problem. Now here comes the challenging part, the remaining member_functions constituting around 70% of the original member_functions are called in other functions in the code and I have no idea how to understand what they do!!!

So is there any methodology or tool or strategy in such cases to attack such problems?

For the information of those of you who might suggest "read the document", "read the comments in the code", "try to understand from the name of the member functions or class" I should say that there is absolutely no comment in the code, the name of the variables, classes and functions don't suggest anything (for instance one member function is called LADP).

Here is a simplified snippet of the code just to give you an idea, this is our CLASS_inc.hxx

#ifndef CLASS_inc_included
#define CLASS_inc_included
#include "blabla1.hxx"
#include "blabla2.hxx"
namespace CLASS_inc
//    COMMON CLASS : VARIOUS WORK VARIABLES
class A_CLASS : public A_Base
{
 public:
  A_CLASS();
  void constructor();
  ~A_CLASS();
  void destructor()
  {
    delete[] _container_of_double;
    _container_of_double = NULL;
  }
  const double& LADP_get() const
  {
    return _LADP;
  }
  const double& LADC_get() const
  {
    return _LADC;
  }
.
.
.
.
.
 private:
  double* _container_of_double;
  double* _LADP;
  double* _LADC;
.
.
.
};
extern A_CLASS* _CLASS;
.
.
.

And then the members of the above class somewhere in the code in other functions are called as the following example:

&CLASS_inc::CLASS().LADP_set()

The above snippets are very simplified depiction of the code but the pattern is the same.

I'm working on unbuntu 20.04 and the code is in C++.

candied_orange · Answer 1 · 2023-05-05T15:47:17.250

One over riding pattern has emerged in every interaction I've ever had with technology: I learn more when it breaks.

So break it.

I mean, this is software. You can't hurt it. Sneak a copy off some where that no one will care about and abuse the heck out of it. Make this class produce little tattle tail messages that make it easy to track what's going on. Dump debugging output when it gets called. Take a peek at the stack and see what called you. Send back nonsense and watch where the nonsense goes.

Give things better names as you think of them. Break things into smaller things. Write tests that show how much you broke it. If there are no comments to show the authors thinking then add comments that show what you’re thinking. “I have no idea what this function does but without it the GUI won’t load”

Just be aware that what you get out of this mostly happens in your head. Sandboxes are fun to play in but usually don't produce useful artifacts. But it may inform your more typical work.

score 13 · Accepted Answer · answered May 05 '23 at 03:00

First, you need a goal when analyzing a class. If you don't have this, you have no idea when to stop. And with a million-line codebase, you could go on forever. Since we don't often read code for the sake of reading code, presumably you need to make changes to the class. Keep this goal in mind as you trace through the code.

Knowing where in the codebase these functions get called is good. The biggest challenge I've had analyzing a single class is understanding the use cases that are impacted if you make changes.

You haven't specified much about the application, but generally you need to identify where the major use cases of the application begin. For a web application, an HTTP request kicks things off. A GUI application will likely start a use case with some kind of event (application-generated or user-initiated). Think of the locations where use cases begin as one end of a spectrum, and where member functions of this class get called as the other end of the spectrum. Your challenge as a developer is to find the path from one end to the other.

To accomplish this you will need to:

Understand the big picture architecture of the application. Where does data access go? Data validations? User interaction logic? Raw business logic?
Understand the major modules of the application.
Determine where this application interfaces with other systems or subsystems.
Know where, within the architecture, does this class reside.
Figure out if this is an algorithm class (it calculates stuff) or a coordinator (it coordinates the actions of some number of other objects).

Once you have a picture in your head how the code is organized, think of the codebase in terms of use cases. Add logging to this class¹. Execute use cases, look for the log messages. Continue adding log messages further up the call stack until you get a meaningful picture about where this class gets used, and for what purposes. This allows you to understand the technical constraints you have when making changes to the class. For example, does it have ownership over any memory? Does it manage file handles, or allocate other resources? If you wanted to refactor the code to use dependency injection for testing purposes, does each impacted use case have an object that satisfies that dependency (and if not, how much of a pain is it to get one)?

Once you can see a plan to safely make code changes to this class, you can stop analyzing and start making those code changes. And hopefully verifying those changes with tests.

¹ A quick note about logging: keep it simple here. You don't need a robust enterprise solution. You should be playing around locally or on some machine designated for development. Hard code the silly path to the log file if necessary. Writing a simple logger should be quick and easy so you spend more time looking through code than building out logging infrastructure.

score 10 · Answer 3 · answered May 05 '23 at 18:36

If it is available, look at the history of commits related to the class in your version control system (CVS, SVN, git, mercurial...). Commit messages might help. Context of commits (ie other commits nearby) may have hints as to what features are implemented (or bugs fixed) by the changes being commited. Also, code parts added or changed together might be related in function/purpose.

score 1 · Answer 4 · answered May 05 '23 at 08:18

Debug it

Put few breakpoints. Run some unit tests in debug mode. Check two things. Input and output of the functions then the stack trace, see how you got there.

Try and repeat the test few times, every time on a broader scope. I mean that after understanding when the function was called you should also check what the caller was doing and then go to a higher level.

How to analyze and understand the use/application of a "class" in a colossal million-line legacy code base?

4 Answers4