Legacy code: what to do in regards to maintenance and moving forward?

Question

I am currently working on an application with legacy code that was built using proof of concepts (POCs). These POCs became the finished production-ready code, there were no tests, and the classes have become quite large over time (pretty much god objects and there are multiple objects that follow this structure). These questions are related to these god objects as legacy code.

For the application itself, it works well enough, but there are bugs that do hinder the users. There are some workarounds in place, but there is a vibe of unreliability in the overall product (bugs creep back into the application due to coupling). For the development team, there is a fear of adding/modifying functionality due to large classes, tight coupling, and typical problems with inheriting legacy code that implements certain business rules and one-offs.

I do understand that with legacy code there is a need to review the costs of maintaining or refactoring code. Most attempts at refactoring would not be worth it, since the usage and coupling of POCs as production-ready code would make the refactor a fairly large undertaking. I have been through part of the Working Effective with Legacy Code book, but I do not know if this scenario specifically comes up. Also, I feel that developing valid tests for these kind of classes, or even some specified functionality, could fall under the same cost review as a refactor. (correct me if I am wrong on the assumption)

As for new functionality, I do not have a grasp on what should happen. There is the idea that new functionality could be separated into the preferred structure determined by the team, but there is the argument for consistency by putting the functionality in with the rest of the methods of some large class. Consistency being that some class which contains many uses/CRUD for many entities should receive the new functionality. For the consistency argument, it could have an advantage for the "possible" payoff on technical debt by having all of the functionality in the same place.

These questions lean more towards preference, and management normally has more weight in determining the direction taken. The main takeaway for me would be the outside perspectives. There could be discussions and other approaches that are being missed by the team (myself included) but could help in the long run. We could be doing everything that we can do, and that leaves us at the point of pushing through these growing pains.

Questions

For Maintenance, is there ever an instance where you cannot avoid a refactor that involves pulling out a piece of functionality from the large class, even if consistency is preferred? If so, then what would be a good example? (I do not know if there is a "problem" that fits this scenario, like what is seen with the square rectangle problem)

(my possible example; some functionality breaks because the large class was meant to deal with some in focus entity, but it is being used for multiple entities that are not always in focus. I cannot change the idea of the entity being in focus otherwise other pieces of functionality will break.)

For Moving Forward: Should the preferred method of adding functionality be up to the team/management? (it could be decided to separate or keep with consistency based on the situation)

Should new functionality always be one or the other (always consistent or always separate), with or without some expected large scale refactor?

score 4 · Accepted Answer · 2017-11-29T20:04:45.970

I'll give a very informal and possibly unusual answer as one who has come from a similar background about a decade and a half ago in ways where the techniques described in Working Effectively With Legacy Code were inapplicable in large part because the team and management were unwilling to accept them.

I was that kind of outspoken young guy insisting on using profilers, writing tests, etc, preparing lengthy presentations which often had some colleagues rolling their eyes. In retrospect I was also rather dogmatic since I saw the existing codebase as the worst thing imaginable and in the foulest ways that Michael Feathers himself probably might not have anticipated. He talked about monster methods spanning a couple hundred lines of code. Let's talk about countless C functions that span over 20,000 lines of code with sporadic goto statements, over 400 variables all declared at the top of the function and uninitialized, and colleagues that get pissed off if you initialize them to try to make the runtime behavior more deterministic while swimming through uninitialized variable bugs because they're afraid of the computational cost of initializing a plain old data type (ignoring the fact that optimizing compilers will generally optimize away redundant initialization). There was a repeatedly echoed policy to not fix what isn't broken but, when examining the nature of the codebase, the fact that each new version of the software often introduced more bugs than it fixed, and the lack of testing, often begged the question of how to effectively tell what wasn't broken.

There's more to this than just software engineering. There's your own sanity at stake if you try to fight too hard against the tide. It can help to relax a little if you feel like your team isn't cooperating and become a cheerful pragmatist and take pleasure in more things in life than a codebase built using the most sound engineering principles, like naked ladies and beer.

Anyway, for a start, monoliths or God objects start to undermine coupling and cohesion in a similar way as having a bunch of global variables. When your program is tripping over state management and finding it cannot effectively maintain invariants, then the difficulty of correcting those issues is proportional to the scope/visibility of the state involved. Just as a global variable might be accessed by hundreds of functions, so too might a class member if the class has hundreds of methods which have access to this persistent state. If the state violates a conceptual invariant and causes bugs, then your list of suspects is proportional to the number of functions that can modify the state, class methods or not. Explaining things this way might convince your colleagues to take it easy on the God objects and start hoisting functions out of a class that do not need access to the class's internals as a compromise. As an extreme example, if the entirety of your codebase's functionality was offered by a single God class, then it would be no more capable of effectively maintaining invariants over its private state than the worst C codebase using nothing but global variables for everything. From a practical debugging standpoint, the first priority to making things sane is typically to simplify state management and be able to confidently maintain invariants. That precedes even principles like SOLID if you have a horrific legacy codebase that's repeatedly encountering invalid states to first get its state management under control.

As for embracing a test-driven mindset, in my case the codebase was too foul and the team was too uncooperative to apply that mentality. You need the entire team really on board including the management if you want to start creating effective unit and integration tests on a codebase that requires 7,000 global variables to be initialized just right by sporadic functions and a main entry point which, itself, is over 80,000 lines of code, just to test a small section of the codebase.

I tried, one time, just over some weekends to construct a unit test of the licensing system for our software in a standalone project that tried to just come up with the minimal code required to test it and a reasonable interface to test against (the former license interface was monolithic and included even sporadic GUI functions). I ended up having to pull in around 800,000 lines of code since just about everything was coupled with everything else just to verify a license code with an initialization process that required dozens of hours of trial and error to figure out what global states needed to be effectively initialized to do so. In that case constructing the unit test required far more time, thought, and debugging than it would have taken to simply create a new licensing library from scratch which applied the same algorithm to verify a license code as the former one (something which could have been done with a few hundred lines of code, not 800k lines of tightly-coupled code). I can exchange horror stories all day long about nasty codebases from that former experience.

If you are in such a scenario, maybe the best option is to leave. That's what I eventually did. That said, I've found it useful for sanity purposes to seek to recreate key sections of the software almost from scratch, only using the old code for reference purposes. The goal is to come up with a completely independent library that you can test which recreates crucial functionality required for your software. It might not be the most productive way to go about it, working completely outside of your software on code that is not going to immediately benefit it, but at the end you get this independent library you can use and confidently demonstrate to your team, showing that it works conceptually through the tests and perhaps convincing enough people to replace large sections of code with your library which is now shown to be effective and reliable. It's a good confidence-boosting exercise if you are swimming in a foul sea of hopeless complexity.

score 2 · Answer 2 · answered Nov 28 '17 at 09:22

I do not agree with the consistency argument, where you say that

Consistency being that some class which contains many uses/CRUD for many entities should receive the new functionality.

One of the things you should strive for is Single Responsibility Principle, so classes that contain many functions for many entities should be avoided. Also, breaking with this approach and moving out new or updated functionality out of the current classes will allow you to easily keep track of what code is still "old". It will give you many benefits. You'll have a better understanding of what needs to be tested and what remains to be refactored, if the time comes.

The exact way to move out code to a new place depends on the architecture in place, the type of application in general and what the new architecture is going to look like.

score 1 · Answer 3 · answered Nov 28 '17 at 08:59

First define what you are moving towards. Unless you have a clear vision about what this is you wont be adding value. eg :

upgrade from classic asp to mvc.
switch to a no-sql database.
move these long running processes to a worker-queue.

If its just "these objects are too big" I think you are going to have trouble defining the 'new way'

Ideally you then stop adding any code to the old way and put all new code in the new way. Obviously this isn't always possible, you will be tempted/forced to do some bug fixes.

Eventually everything is the 'new way' and you can remove the last bits of the 'old way' by then you might have a 'new new way' though

Legacy code: what to do in regards to maintenance and moving forward?

3 Answers3

Linked