50

We are designing coding standards, and are having disagreements as to if it is ever appropriate to break code out into separate functions within a class, when those functions will only ever be called once.

For instance:

f1()
{
   f2();  
   f4();
}


f2()
{
    f3()
    // Logic here
}


f3()
{
   // Logic here
}

f4()
{
   // Logic here
}

versus:

f1()
{
   // Logic here
   // Logic here
   // Logic here
}

Some argue that it is simpler to read when you break up a large function using separate single-use sub functions. However, when reading code for the first time, I find it tedious to follow the logic chains and optimize the system as a whole. Are there any rules typically applied to this sort of function layout?

Please note that unlike other questions, I am asking for the best set of conditions to differentiate allowable and non-allowable uses of single call functions, not just if they are allowed.

David
  • 1,899

8 Answers8

94

The rationale behind splitting functions is not how many times (or how often) they will be called, it's keeping them small and preventing them from doing several different things.

Bob Martin's book Clean Code gives good guidelines on when to split a function:

  • Functions should be small; how small? See the bullet bellow.
  • Functions should do only one thing.

So if the function is several screens long, split it. If the function does several things, split it.

If the function is made of sequential steps aimed to a final result, then no need to split it, even if it's relatively long. But if the function does one thing, then another, then another, and then another, with conditions, logically separated blocks, etc., it should be split. As a result of that logic, functions should be usually small.

If f1() does authentication, f2() parses input into smaller parts, if f3() makes calculations, and f4() logs or persist results, then they obviously should be separated, even when every one of them will be called just once.

That way you can refactor and test them separately, besides the added advantage of being easier to read.

On the other hand if all the function does is:

a=a+1;
a=a/2;
a=a^2
b=0.0001;
c=a*b/c;
return c;

then there is no need to split it, even when the sequence of steps is long.

Tulains Córdova
  • 39,570
  • 13
  • 100
  • 156
40

I think function naming is very important here.

A heavily dissected function can be very self-documenting. If each logical process within a function is split out into its own function, with minimal internal logic, then the behavior of each statement can be reasoned out by the names of the functions and the parameters they take.

Of course, there is a downside: the function names. Like comments, such highly specific functions can often get out of sync with what the function is actually doing. But, at the same time, by giving it a proper function name you make it harder to justify scope creep. It becomes more difficult to make a sub-function do something more than it clearly ought to.

So I would suggest this. If you believe a section of code could be split out, even though nobody else calls it, ask yourself this question: "What name would you give it?"

If answering that question takes you more than 5 seconds, or if the function name you pick is decidedly opaque, there's a good chance that it isn't a separate logical unit within the function. Or at the very least, that you're not sure enough about what that logical unit is really doing to properly split it out.

But there is an additional problem that heavily dissected functions can encounter: bug fixing.

Tracking down logical errors within a 200+line function is hard. But tracking them through 10+ individual functions, while trying to remember the relationships between them? That can be even harder.

However again, semantic self-documentation through names can play a key role. If each function has a logical name, then all you need to do to validate one of the leaf functions (besides unit testing) is to see if it actually does what it says it does. Leaf functions tend to be short and focused. So if every individual leaf function does what says it should, then the only possible problem is that someone passed them the wrong stuff.

So in that case, it can be a boon to bug fixing.

I think it really does come down to the question of whether you can assign a meaningful name to a logical unit. If so, then it can probably be a function.

Nicol Bolas
  • 12,043
  • 4
  • 39
  • 48
12

Any time you feel the need to write a comment to describe what a block of text is doing, you have found an opportunity to extract a method.

Rather than

//find eligible contestants
var eligible = contestants.Where(c=>c.Age >= 18)
eligible = eligible.Where(c=>c.Country == US)

try

var eligible = FindEligible(contestants)
dss539
  • 889
5

DRY - Don't repeat yourself - is just one of several principles that have to be balanced.

Some other that come to mind here are naming. If the logic is convoluted not obvious to the casual reader, extraction into method/function whose name better encapsulated what and why it is doing it it can improve program readability.

Also aiming for less than 5-10 lines of code method/function may come into play, depending on how many lines the //logic becomes.

Also, a function with parameters can act as an api and can name the parameters appropriately to then make the code logic clearer.

Also you may find that over time collections of such functions do reveal dome useful grouping, e.g. admin and then they can be easily gathered under it.

5

The point about splitting functions is all about one thing: simplicity.

A reader of code cannot have more than about seven things in mind simultaneously. Your functions should reflect that.

  • If you build too long functions, they will be unreadable because you have much more than seven things inside your functions.

  • If you build a ton of one-line functions, readers similarly get confused in the tangle of functions. Now you would have to keep more than seven functions in your memory to understand the purpose of each one.

  • Some functions are simple even though they are long. The classic example is when a function contains one big switch statement with many cases. As long as the handling of each case is simple, your functions is not too long.

Both extremes (the Megamoth and the soup of tiny functions) are equally bad, and you have to strike a balance in between. In my experience, a nice function has something around ten lines. Some functions will be one-liners, some will exceed twenty lines. The important thing is that each function serves an easily understandable function while being equally understandable in its implementation.

4

It's all about separation of concerns. (ok, not all about it; this is a simplification).

This is fine:

function initializeUser(name, job, bye) {
    this.username = name;
    this.occupation = job;
    this.farewell = bye;
    this.gender = Gender.unspecified;
    this.species = Species.getSpeciesFromJob(this.occupation);
    ... etc in the same vein.
}

That function is concerned with one single concern: it sets the initial properties of a user from the provided arguments, defaults, extrapolation, etc.

This is not fine:

function initializeUser(name, job, bye) {
    // Connect to internet if not already connected.
    modem.getInstance().ensureConnected();
    // Connect to user database
    userDb = connectToDb(USER_DB);
    // Validate that user does not yet exist.
    if (0 != userDb.db_exec("SELECT COUNT(*) FROM `users` where `name` = %d", name)) {
        throw new sadFace("User exists");
    }
    // Configure properties. Don't try to translate names.
    this.username = removeBadWords(name);
    this.occupation = removeBadWords(translate(job));
    this.farewell = removeBadWords(translate(bye));
    this.gender = Gender.unspecified;
    this.species = Species.getSpeciesFromJob(this.occupation);
    userDb.db_exec("INSERT INTO `users` set `name` = %s", this.username);
    // Disconnect from the DB.
    userDb.disconnect();
}

Separation of concerns suggests this should be handled as multiple concerns; the DB handling, the user-existence validation, the setting of the properties. Each of these is very easily tested as a unit, but testing all of them in a single method makes for a very convoluted unit test set, which would be testing things as different as how it handled the DB going away, how it handles creation of empty and invalid job titles, and how it handles creating a user twice (answer: badly, there's a bug).

Part of the problem is that it goes all over the place in terms of how high-level it is: low level networking and DB stuff has no place here. That's part of the concern separation. The other part is that stuff that should be the concern of something else, is instead the concern of the init function. For example, whether to translate or apply bad language filters, might make more sense as a concern of the fields being set.


Later edit: this all means that in an ideal codebase there should be no methods or functions which are called only once. They should be called at the very least twice: once by the code they're intended for, and once by the unit tests.

Dewi Morgan
  • 243
  • 2
  • 7
2

Pretty much depends on what your // Logic Here is.

If it's a one-liner, then probably you don't need a functional decomposition.

If, on the other hand, it's lines and lines of code, then it is much better to put it into a separate function and name it appropriately (f1,f2,f3 does not pass muster here).

This is all have to do with human brains on average are not very efficient with processing large amounts of data at a glance. In a sense, it does not matter what data: multi-line function, busy intersection, 1000-piece puzzle.

Be a friend of your code maintainer's brain, cut down on pieces in the entry point. Who knows, that code maintainer may even be you a few month later.

1

The "right" answer, according to the prevalent coding dogmas, is to split large functions into small, easy to read, testable and tested functions with self documenting names.

That said, defining "large" in terms of "lines of code" can seem arbitrary, dogmatic, and tedious, which can cause unnecessary disagreements, scruples, and tension. But, fear not! For, if we recognize that the primary purposes behind the lines-of-code limit are readability and testability, we can easily find the appropriate line-limit for any given function! (And start building the foundation for an internally relevant soft-limit.)

Agree as a team to allow the megalithic functions, and to extract lines into smaller, well-named functions at the first sign that the function is either difficult to read as a whole, or when subsets are uncertain in correctness.

... And if everyone is honest during the initial implementation, and if none of them are touting an IQ over 200, the limits of understandability and testability can often be identified before anyone else sees their code.

svidgen
  • 15,252