7

I have a project at work that was written entirely by a scientist who was learning programming while writing it. It is almost 200,000 lines of C++, and almost every variable is a global variable (over 2,000 global variables). I think he found out about local variables about half way through writing the project. The few local variable names are almost always x, xx, i, ii, j, jj, M, or some other single letter name. This program is prone to seg faulting and running valgrind reveals nearly a thousand instances of memory corruption. It is totally reliant on undefined behavior, to the point where it only produces the correct results on CentOS 7, and Ubuntu yields completely different and incorrect results (it's scientific code). It is completely indecipherable by anyone but the original author. We are in a unique position now where the company is going to bet everything on this one piece of software written by someone who has never written production software before. There is an extreme bus factor here, because after working on this codebase for months, I'm baffled by almost every line, and implementing the most basic features takes an incredibly long time. No other developer at the company wants to touch this code. This is by far the worst code I've ever seen. I never saw code this bad even from freshman CS students.

Given this situation, is this an appropriate case to "burn it down, and start from scratch"? What is a good strategy to make this a maintainable code base? This code will need frequent updates as research progresses, meaning freezing the code is not an option.

For clarification, this project is not yet being used in production. It has been entirely proof of concept until now, and will be used in production this summer.

For anyone curious of what this would look like, the functions look something like this.

void doSomething(void) {
  sideEffect1();
  sideEffect2();
  sideEffect3();
  ...
  sideEffect145();
}

void sideEffect1(void) {
  if (globalVar1) {
    return;
  }
  anotherSideEffect1();
  if (globalVar2) {
    globalVar1 = globalVar2 + 1;  
  }
  ... hundreds of lines later
}
Kyle
  • 179

3 Answers3

8

Do you have complete, detailed, and comprehensive set of requirements, preferably as executable tests?

I'm going to guess: No as the writer probably hasn't discovered those yet...

In such a case, there is no guarantee that burning it down and rebuilding it will be even comparable. As you've already noted, you cannot even be sure that the code acts the way you would expect it to due to use of undefined behaviours. Not to mention you are even perplexed at to what the algorithm is itself due to the snarls of logic spread across the code base.

Instead, I would start by developing a battery of tests. Even if you simply grabbed some months worth of inputs/outputs and just check that the data matches.

At the very least this will give you some % chance of knowing if you have borked the code.

Refactor

Start by picking one function, one variable, or one class. See what you can do to improve it.

  • improve its name. Figure out what the function/variable/class does/represents.
  • pull a variable into a tighter scope.
  • find a couple of lines of sequential code, and pull it into its own named function.
  • wrap functions/variables into classes
  • split classes to increase cohesion, and introduce boundaries between concerns.
  • rephrase an undefined behaviour using well defined semantics.
  • etc... just small changes that improve your ability to read the code base and understand it.

Run the tests frequently to find anything obviously broken.


It would be best if you did this buddy programming with that scientist. This way you can improve the code base, train the scientist in how to write code better, and have the scientist input as to what each thing means/does.

It might also help if you institute Pull Requests with a mandatory code-review. This will allow you to push back on obviously bad code, so that things aren't getting worse.

Kain0_0
  • 16,561
3

We are in a unique position now where the company is going to bet everything on this one piece of software written by someone who has never written production software before.

This is a huge risk.

That is why I would focus on the immediate problems (to prevent refactor-rage):

  • Is it an immediate problem that it doesn't run on multiple Operating Systems?
  • Is the program unreliable under all circumstances or can it reliably be used in a specific way and always produce good results when used that way?
  • How much time does it take to adjust the code for the most common changes? What is the minimal change that can be made to improve that?

In other words: don't fix what is not broken and focus on making a minimal viable product.

This focus would help to get the program up and running as soon as possible.

Then, maybe in parallel, start renaming the functions and variables in close collaboration with the original author. Mainly to learn and understand the domain and the solution. Do not change anything except for the names. The goal is understanding.

After the renaming, proper refactoring could be attempted.

Emond
  • 1,258
0

If you burn it down and start rom scratch I can guarantee you will end up with a mess that isn’t working. Your best chance is to hire an excellent professional with a bit of a masochistic streak and pay him to improve this code. Won’t be cheap.

gnasher729
  • 49,096