1

I often have to write a function which may return an output of two kinds: "short" and "long". As an example, consider the subset sum problem: the input is a set of integers S and an integer C, and the output is the largest sum of a subset of S that is at most C. The short output is just the largest sum (an integer); the long output is the subset itself. I can implement both variants in two separate functions, e.g.:

>>> subset_sum_short([1,2,4,9], 8)
7
>>> subset_sum_long ([1,2,4,9], 8)
[1,2,4]

But this causes a lot of code duplication, since the implementation is very similar. Both functions use dynamic progamming to check all possible sums; the long version also keeps information for backtracking.

To avoid code-duplication, I can implement only the long version, since given the optimal subset, the user can easily compute its sum:

>>> subset_sum_long([1,2,4,9], 8)
7

But it is wasteful for users who only need the sum, since it requires keeping unneeded backtracking information. The backtracking information, in this case, can increase the run-time and the memory requirements of the function by a factor of 2 or even 3.

A third solution is to have a single function with a flag:

>>> subset_sum([1,2,4,9], 8, short=True)
7
>>> subset_sum([1,2,4,9], 8, short=False)
[1,2,4]

This avoids the duplication, but the code becomes cluttered with ifs. Moreover, in the future I would like to allow a third output format, for example, giving each item a name, and returning a list of names instead of their sizes:

>>> subset_sum({"a":1, "b":2, "c":4, "d":9}, 8)
["a","b","c"]

Again, the algorithm is the same, but implementing this in a single function will require even more ifs.

Is there a design pattern that allows to write clean, simple and efficient code, while still giving the user a simple way to choose the required output format?

3 Answers3

2

The Strategy Pattern allows you to swap out implementations of methods together. The implementation differences are hidden behind method names common to each style. Unneeded work can be replaced with pass.

>>>subset_sum([1,2,4,9], 8, style)

Done this way you can sometimes hide the style of output and differences in how it's calculated from the rest of the algorithm. How well it works depends on the algorithm. While this can be made to work it often fails when a new style is introduced requiring that the algorithm be rewritten to use new style methods that didn't exist before.

Can it be applied to this algorithm? Well if you could have made it work by passing in a boolean or enum there is a refactoring to change that code to code that works with this pattern. It's called Replace Conditional with Polymorphism.

candied_orange
  • 119,268
1

Writing similar code twice is not problematic if the duplication is functionally justifiable. That is, these two functions do different things--which is the issue you seem to be hung up on--that's okay! It's still DRY if there's no reasonable room for abstraction. The extra LOC are fully justified by the (apparently very critical) performance advantage of have two distinct functions. This is good (and pythonic) practice, for sure. However, you could maybe look for commonalities between the two functions and refactor those lines as standalone functions, and write them in cython to boot.

mck68w
  • 11
0

Is there a design pattern that allows to write clean, simple and efficient code, while still giving the user a simple way to choose the required output format?

Sure, it's called Model View Controller. The idea is to decouple how you present output from your state. Here's a simple example:

>>>sum_of_set( subset_sum_long([1,2,4,9], 8) )

7

If you're looking at that and thinking that this is the late summing you wanted to avoid you're right, kinda, we could implement this that way. Or we could return something from subset_sum_long() that isn't simply a set. We could return a data structure that has a place to accumulate the sum and let sum_of_set() pluck it off and return it. Completely up to you.

This also solves your third output format issue. You can create a data structure that holds the needed information that allows for all three output formats. Then you just need a way to choose which output you want to express.

All of this avoids duplicating code. It also avoids passing in control flags. But it does complicate the algorithm as more and more things get added to the data structure. Where you decide to draw the line here is up to you. But not having to decide exactly how to present the results until after the data structure is created takes a lot of the pressure off since all it needs is the data. No presentation information.

candied_orange
  • 119,268