Code duplication vs. abstraction

Question

I've inherited some research code where there's already a fair amount of code duplication: on several occasions, the original author duplicated a file and changed minor things to calculate a variation of the original problem. I'm tasked with making modifications to calculate a different variation. Following the style that's already going on, I could complete the task by duplicating and modifying 4-5 files. But, I started doing this and felt I was just making an already somewhat disorganized project, worse.

To be slightly more specific: say there's some class (this is C++) Foo whose definition makes use of some other class Bar. One possibility is to copy Bar to Bar2 and make minimal changes, and then copy Foo to Foo2 with some minimal changes including changing all instances of Bar to Bar2.

Alternatively, I could make use of templates: change Foo to FooX<BarX>, and then have Foo and Foo2 be aliases for FooX<Bar> and FooX<Bar2> with a handful of redefined functions. To me this feels like the right thing to do, but my worry is mostly that it increases the level of abstraction (there's already some templating going on so this will cause nested templates FooX<BarX<Baz>>) and it's unlikely there will be a third use of FooX.

I guess the general question it to what extent one can excuse code duplication by (1) reduction of questionably useful abstractions; (2) following somewhat questionable preexisting patterns.

score 20 · Answer 1 · answered Sep 08 '21 at 02:10

The general rule of thumb is "three strikes and you refactor": you will generally see a useful abstraction emerge by the time you've written your third implementation of something. This doesn't mean that you should never abstract earlier, but it might be harder to come up with the right interface that will fit future implementations.

In your case, if you're just copying most of the code unchanged and only making minor modifications, there's a good argument for abstracting out the part that actually changes, especially since you said this has already been done once before. But there are always other factors to consider such as whether this is just a prototype that you plan to throw away or if other people will be reading/extending this code in the future.

Doc Brown · Accepted Answer · 2021-09-08T20:47:17.143

Code duplication is not always a severe issue. It becomes an issue if both of the duplicates are still in use and hence under maintenance, which will impose a certain probability the duplication will cause the neccessity for fixing bugs or refactorings in several, duplicate places, and in consequence, will often lead to more bugs and inconsistencies.

So the first thing you need to check here is the life cycle of the code in stake:

are the original files / original calculations not in use any more, and only the duplicates / new variations under maintenance? Then the duplication is ok.
or is it expected that code duplication will also duplicate maintenance and evolvement efforts? Then keeping the code DRY should be the most sensible approach.

To my experience, in "research code" the first case happens more frequently than in "business code" (but YMMV).

Note for reducing duplication, you will always need to do some refactoring, and you need to make sure you don't break existing functions that way. For this, you need to have automated tests in place (using refactoring tools is also a good idea, if your IDE provides them) - make sure you have those before changing the existing code base.

For your question about "abstractions vs duplication": there is this cite from Sandy Metz "duplication is far cheaper than the wrong abstraction". When you are going to avoid duplication by abstraction, keep in mind there will be always a trade-off involved, and when an abstraction later turns out to make the evolving code overly complex, don't hesitate to turn the wheel back and use some duplication again if that's cheaper.

Code duplication vs. abstraction

2 Answers2