244

I was trying to find alternatives to the use of global variable in some legacy code. But this question is not about the technical alternatives, I'm mainly concerned about the terminology.

The obvious solution is to pass a parameter into the function instead of using a global. In this legacy codebase that would mean that I have to change all functions in the long call chain between the point where the value will eventually be used and the function that receives the parameter first.

higherlevel(newParam)->level1(newParam)->level2(newParam)->level3(newParam)

where newParam was previously a global variable in my example, but it could have been a previously hardcoded value instead. The point is that now the value of newParam is obtained at higherlevel() and has to "travel" all the way to level3().

I was wondering if there was a name(s) for this kind of situation/pattern where you need to add a parameter to many functions that just "pass" the value unmodified.

Hopefully, using the proper terminology will allow me to find more resources about solutions for redesign and describe this situation to colleagues.

KChaloux
  • 5,824
RubenLaguna
  • 1,962

10 Answers10

228

The data itself is called "tramp data". It is a "code smell", indicating that one piece of code is communicating with another piece of code at a distance, through intermediaries.

  • Increases rigidity of code, especially in the call chain. You are much more constrained in how you refactor any method in the call chain.
  • Distributes knowledge about data/methods/architecture to places that don't care in the least about it. If you need to declare the data that is just passing through, and the declaration requires a new import, you have polluted the name space.

Refactoring to remove global variables is difficult, and tramp data is one method of doing so, and often the cheapest way. It does have its costs.

BobDalgleish
  • 4,749
110

I don't think this, in itself, is an anti-pattern. I think the problem is that you are thinking of the functions as a chain when really you should think of each one as an independent black box (NOTE: recursive methods are a notable exception to this advice.)

For example, let's say I need to calculate the number of days between two calendar dates so I create a function:

int daysBetween(Day a, Day b)

In order to do this, I then create a new function:

int daysSinceEpoch(Day day)

Then my first function becomes simply:

int daysBetween(Day a, Day b)
{
    return daysSinceEpoch(b) - daysSinceEpoch(a);
}

There's nothing anti-pattern about this. The parameters of the daysBetween method are being passed off to another method and are never otherwise referenced in the method but they are still needed for that method to do what it needs to do.

What I would recommend is looking at each function and start with a couple of questions:

  • Does this function have a clear and focused goal or is it a "do some things" method? Usually the name of the function helps here and if there is stuff in it that is not described by the name, that's a red flag.
  • Are there too many parameters? Sometimes a method can legitimately need lots of input but having so many parameters makes it burdensome to use or understand.

If you are looking at a jumble of code without a single purpose bundled into a method, you should start by unraveling that. This can be tedious. Start with the easiest things to pull out and move into a separate method and repeat until you have something coherent.

If you just have too many parameters, consider Method to Object refactoring.

Deduplicator
  • 9,209
JimmyJames supports Canada
  • 30,578
  • 3
  • 59
  • 108
73

BobDalgleish has already noted that this (anti-)pattern is called "tramp data".

In my experience, the most common cause of excessive tramp data is having a bunch of linked state variables that should really be encapsulated in an object or a data structure. Sometimes, it may even be necessary to nest a bunch of objects to properly organize the data.

For a simple example, consider a game that has a customizable player character, with properties like playerName, playerEyeColor and so on. Of course, the player also has a physical position on the game map, and various other properties like, say, current and maximum health level, and so on.

In a first iteration of such a game, it might be a perfectly reasonable choice to make all these properties into global variables — after all, there's only one player, and almost everything in the game somehow involves the player. So your global state might contain variables like:

playerName = "Bob"
playerEyeColor = GREEN
playerXPosition = -8
playerYPosition = 136
playerHealth = 100
playerMaxHealth = 100

But at some point, you might find that you need to change this design, perhaps because you want to add a multiplayer mode into the game. As a first attempt, you could try making all those variables local, and passing them to functions that need them. However, you may then find that a particular action in your game might involve a function call chain like, say:

mainGameLoop()
 -> processInputEvent()
     -> doPlayerAction()
         -> movePlayer()
             -> checkCollision()
                 -> interactWithNPC()
                     -> interactWithShopkeeper()

...and the interactWithShopkeeper() function has the shopkeeper address the player by name, so you now suddenly need to pass playerName as tramp data through all those functions. And, of course, if the shopkeeper thinks that blue-eyed players are naïve, and will charge higher prices for them, then you'd need to pass playerEyeColor through the whole chain of functions, and so on.

The proper solution, in this case, is of course to define a player object that encapsulates the name, eye color, position, health and any other properties of the player character. That way, you only need to pass that single object around to all the functions that somehow involve the player.

Also, several of the functions above could be naturally made into methods of that player object, which would automatically give them access to the player's properties. In a way, this is just syntactic sugar, since calling a method on a object effectively passes the object instance as a hidden parameter to the method anyway, but it does make the code look clearer and more natural if used properly.

Of course, a typical game would have a lot more "global" state than just the player; for example, you'd almost certainly have some kind of a map on which the game takes place, and a list of non-player characters moving on the map, and maybe items placed on it, and so on. You could pass all those around as tramp objects too, but that would again clutter up your method arguments.

Instead, the solution is to have the objects store references to any other objects that they have permanent or temporary relationships with. So, for example, the player object (and probably any NPC objects too) probably should store a reference to the "game world" object, which would have a reference to the current level / map, so that a method like player.moveTo(x, y) does not need to be explicitly given the map as a parameter.

Similarly, if our player character had, say, a pet dog that followed them around, we would naturally group all the state variables describing the dog into a single object, and give the player object a reference to the dog (so that the player can, say, call the dog by name) and vice versa (so that the dog knows where the player is). And, of course, we'd probably want to make the player and the dog objects both subclasses of a more generic "actor" object, so that we can reuse the same code for, say, moving both around the map.

Ps. Even though I've used a game as an example, there are other kinds of programs where such issues come up as well. In my experience, though, the underlying problem tends to always be the same: you have a bunch of separate variables (whether local or global) that really want to be bunched together into one or more interlinked objects. Whether the "tramp data" intruding into your functions consists of "global" option settings or cached database queries or state vectors in a numerical simulation, the solution is invariably to identify the natural context that the data belongs to, and make that into an object (or whatever is the closest equivalent in your chosen language).

40

I am not aware of a specific name for this, but I guess it is worth to mention that the problem you describe is just the problem of finding the best compromise for the scope of such a parameter:

  • as a global variable, the scope is too big when the program reaches a certain size

  • as a purely local parameter, the scope might be too small, when it leads to lots of repetitive parameter lists in call chains

  • so as a trade-off, you can often make such a parameter a member variable in one or more classes, and that is what I would just call proper class design.

Doc Brown
  • 218,378
22

I believe the pattern you're describing is exactly dependency injection. Several commenters have argued that this is a pattern, not an anti-pattern, and I would tend to agree.

I also concur with @JimmyJames's answer, where he claims that it is good programming practice to treat each function as a black box that takes all its inputs as explicit parameters. That is, if you're writing a function that makes a peanut butter and jelly sandwich, you could write it as

Sandwich make_sandwich() {
    PeanutButter pb = get_peanut_butter();
    Jelly j = get_jelly();
    return pb + j;
}
extern PhysicalRefrigerator g_refrigerator;
PeanutButter get_peanut_butter() {
    return g_refrigerator.get("peanut butter");
}
Jelly get_jelly() {
    return g_refrigerator.get("jelly");
}

but it would be better practice to apply dependency injection and write it like this instead:

Sandwich make_sandwich(Refrigerator& r) {
    PeanutButter pb = get_peanut_butter(r);
    Jelly j = get_jelly(r);
    return pb + j;
}
PeanutButter get_peanut_butter(Refrigerator& r) {
    return r.get("peanut butter");
}
Jelly get_jelly(Refrigerator& r) {
    return r.get("jelly");
}

Now you have a function that clearly documents all its dependencies in its function signature, which is great for readability. After all, it is true that in order to make_sandwich you need access to a Refrigerator; so the old function signature was basically disingenuous by not taking the refrigerator as part of its inputs.

As a bonus, if you do your class hierarchy right, avoid slicing, and so on, you can even unit-test the make_sandwich function by passing in a MockRefrigerator! (You might need to unit-test it this way because your unit-test environment might not have access to any PhysicalRefrigerators.)

I do understand that not all uses of dependency injection require plumbing a similarly named parameter many levels down the call stack, so I'm not answering exactly the question you asked... but if you're looking for further reading on this subject, "dependency injection" is definitely a relevant keyword for you.

Deduplicator
  • 9,209
Quuxplusone
  • 568
  • 4
  • 8
15

This is pretty much the textbook definition of coupling, one module having a dependency that deeply affects another, and that creates a ripple effect when changed. The other comments and answers are correct that this is an improvement over the global, because the coupling is now more explicit and easier for the programmer to see, instead of subversive. That doesn't mean it shouldn't be fixed. You should be able to refactor to remove or reduce the coupling, although if it's been in there a while it can be painful.

Karl Bielefeldt
  • 148,830
8

While this answer doesn't directly answer your question, I feel I would be remiss to let it pass without mentioning how to improve on it (since as you say, it may be an anti-pattern). I hope that you and other readers can get value from this additional commentary on how to avoid "tramp data" (as Bob Dalgleish so helpfully named it for us).

I agree with answers that suggest doing something more OO to avoid this problem. However, another way to also help reduce this passing of arguments deeply without just jumping to "just pass a class where you used to pass many arguments!" is to refactor so that some steps of your process occur at the higher level instead of the lower one. For example, here's some before code:

public void PerformReporting(StuffRepository repo, string desiredName) {
   var stuffs = repo.GetStuff(DateTime.Now());
   FilterAndReportStuff(stuffs, desiredName);
}

public void FilterAndReportStuff(IEnumerable<Stuff> stuffs, string desiredName) {
   var filter = CreateStuffFilter(FilterTypes.Name, desiredName);
   ReportStuff(stuffs.Filter(filter));
}

public void ReportStuff(IEnumerable<Stuff> stuffs) {
   stuffs.Report();
}

Note that this becomes even worse the more things that have to be done in ReportStuff. You might have to pass in the instance of the Reporter you want to use. And all sorts of dependencies that have to be handed along, function to nested function.

My suggestion is to pull all that to a higher level, where the knowledge of the steps required lives in a single method instead of being spread across a chain of method calls. Of course it would be more complicated in real code, but this gives you an idea:

public void PerformReporting(StuffRepository repo, string desiredName) {
   var stuffs = repo.GetStuff(DateTime.Now());
   var filter = CreateStuffFilter(FilterTypes.Name, desiredName);
   var filteredStuffs = stuffs.Filter(filter)
   filteredStuffs.Report();
}

Notice that the big difference here is that you don't have to pass the dependencies through a long chain. Even if you flatten to not just one level, but a few levels deep, if those levels also achieve some "flattening" so that the process is seen as a series of steps at that level, you'll have made an improvement.

While this is still procedural and nothing has been turned into an object yet, it is a good step toward deciding what kind of encapsulation you can achieve through turning something into a class. The deeply-chained method calls in the before scenario hide the details of what is actually happening and can make the code very hard to understand. While you can overdo this and end up making the higher-level code know about things it shouldn't, or make a method that does too many things thus violating the single-responsibility principle, in general I have found that flattening things a bit aids in clarity and in making incremental change toward better code.

Note that while you're doing all this, you should consider testability. The chained method calls actually make unit testing harder because you don't have a good entry point and exit point in the assembly for the slice that you want to test. Notice that with this flattening, since your methods no longer take so many dependencies, they're easier to test, not requiring as many mocks!

I recently tried to add unit tests to a class (that I didn't write) that took something like 17 dependencies, all of which had to be mocked! I haven't gotten it all worked out yet, but I split the class into three classes, each dealing with one of the separate nouns it was concerned with, and got the dependency list down to 12 for the worst one and about 8 for the best one.

Testability will force you to write better code. You should be writing unit tests because you will find that it makes you think about your code differently and you'll write better code from the get-go, regardless of how few bugs you might have had before writing the unit tests.

ErikE
  • 1,181
3

You're not literally violating the Law of Demeter, but your problem is similar to that in some ways. Since the point of your question is to find resources, I suggest you read about Law of Demeter and see how much of that advice applies to your situation.

catfood
  • 1,615
1

There are instances where the best (in terms of efficiency, maintainability and ease of implementation) to have certain variables as global rather than the overhead of always passing everything around (say you have 15 or so variables that has to persist). So it makes sense to find programming language that supports scoping better (as C++'s private static variables) to alleviate the potential mess (of namespace and having things tampered). Of course this is just common knowledge.

But, the approach stated by the OP is very useful if one is doing Functional Programming.

kozner
  • 119
1

There is no anti-pattern here at all, because the caller doesn't know about all these levels below and doesn't care.

Someone is calling higherLevel(params) and expects higherLevel to do its job. What higherLevel does with params is none of the callers business. higherLevel handles the problem in the best way it can, in this case by passing params to level1(params). That's absolutely Ok.

You see a call chain - but there is no call chain. There is a function at the top doing its job the best way it can. And there are other functions. Each function could be replaced at any time.

gnasher729
  • 49,096