63

I am working in a .Net, C# shop and I have a coworker that keeps insisting that we should use giant Switch statements in our code with lots of "Cases" rather than more object oriented approaches. His argument consistently goes back to the fact that a Switch statement compiles to a "cpu jump table" and is therefore the fastest option (even though in other things our team is told that we don't care about speed).

I honestly don't have an argument against this...because I don't know what the heck he's talking about.
Is he right?
Is he just talking out his ass?
Just trying to learn here.

Adam Lear
  • 32,069

17 Answers17

51

He is probably an old C hacker and yes, he talking out of his ass. .Net is not C++; the .Net compiler keeps on getting better and most clever hacks are counter-productive, if not today then in the next .Net version. Small functions are preferable because .Net JIT-s every function once before it is being used. So, if some cases never get hit during a LifeCycle of a program, so no cost is incurred in JIT-compiling these. Anyhow, if speed is not an issue, there should not be optimizations. Write for programmer first, for compiler second. Your co-worker will not be easily convinced, so I would prove empirically that better organized code is actually faster. I would pick one of his worst examples, rewrite them in a better way, and then make sure that your code is faster. Cherry-pick if you must. Then run it a few million times, profile and show him. That ought to teach him well.

EDIT

Bill Wagner wrote:

Item 11: Understand the Attraction of Small Functions(Effective C# Second Edition)   Remember that translating your C# code into machine-executable code is a two-step process. The C# compiler generates IL that gets delivered in assemblies. The JIT compiler generates machine code for each method (or group of methods, when inlining is involved), as needed. Small functions make it much easier for the JIT compiler to amortize that cost. Small functions are also more likely to be candidates for inlining. It’s not just smallness: Simpler control flow matters just as much. Fewer control branches inside functions make it easier for the JIT compiler to enregister variables. It’s not just good practice to write clearer code; it’s how you create more efficient code at runtime.

EDIT2:

So ... apparently a switch statement is faster and better than a bunch of if/else statements, because one comparison is logarithmic and another is linear. http://sequence-points.blogspot.com/2007/10/why-is-switch-statement-faster-than-if.html

Well, my favorite approach to replacing a huge switch statement is with a dictionary (or sometimes even an array if I am switching on enums or small ints) that is mapping values to functions that get called in response to them. Doing so forces one to remove a lot of nasty shared spaghetti state, but that is a good thing. A large switch statement is usually a maintenance nightmare. So ... with arrays and dictionaries, the lookup will take a constant time, and there will be little extra memory wasted.

I am still not convinced that the switch statement is better.

Job
  • 6,459
43

Unless your colleague can provide proof, that this alteration provides an actual measurable benefit on the scale of the whole application, it is inferior to your approach (i.e. polymorphism), which actually does provide such a benefit: maintainability.

Microoptimisation should only be done, after bottlenecks are pinned down. Premature optimization is the root of all evil.

Speed is quantifiable. There's little useful information in "approach A is faster than approach B". The question is "How much faster?".

Jim G.
  • 8,035
back2dos
  • 30,140
27

Who cares if it's faster?

Unless you're writing real-time software it's unlikely that the minuscule amount of speedup you might possibly get from doing something in a completely insane manner will make much difference to your client. I wouldn't even go about battling this one on the speed front, this guy is clearly not going to listen to any argument on the subject.

Maintainability, however, is the aim of the game, and a giant switch statement is not even slightly maintainable, how do you explain the different paths through the code to a new guys? The documentation will have to be as long as the code itself!

Plus, you've then got the complete inability to unit test effectively (too many possible paths, not to mention the probable lack of interfaces etc.), which makes your code even less maintainable.

[On the being-interested side: the JITter performs better on smaller methods, so giant switch statements (and their inherently large methods) will harm your speed in large assemblies, IIRC.]

Ed James
  • 3,499
  • 3
  • 24
  • 34
14

Step away from the switch statement ...

This type of switch statement should be shunned like a plague because it violates the Open Closed Principle. It forces the team to make changes to existing code when new functionality needs to be added, as opposed to, just adding new code.

phm271
  • 3
Dakotah North
  • 3,373
  • 1
  • 23
  • 31
11

I don't buy the performance argument; it's all about code maintainability.

BUT: sometimes, a giant switch statement is easier to maintain (less code) than a bunch of small classes overriding virtual function(s) of an abstract base class. For example, if you were to implement a CPU emulator, you would not implement the functionality of each instruction in a separate class -- you would just stuff it into a giant swtich on the opcode, possibly calling helper functions for more complex instructions.

Rule of thumb: if the switch is somehow performed on the TYPE, you should probably use inheritance and virtual functions. If the switch is performed on a VALUE of a fixed type (e.g., the instruction opcode, as above), it's OK to leave it as it is.

zvrba
  • 3,480
8

I have survived the nightmare known as the massive finite state machine manipulated by massive switch statements. Even worse, in my case, the FSM spanned three C++ DLLs and it was quite plain the code was written by someone versed in C.

The metrics you need to care about are:

  • Speed of making a change
  • Speed of finding the problem when it happens

I was given the task of adding a new feature to that set of DLLs, and was able to convince management that it would take me just as long to rewrite the 3 DLLs as one properly object oriented DLL as it would be for me to monkey patch and jury rig the solution into what was already there. The rewrite was a huge success, as it not only supported the new functionality but was much easier to extend. In fact, a task that would normally take a week to make sure you didn't break anything would end up taking a few hours.

So how about execution times? There was no speed increase or decrease. To be fair our performance was throttled by the system drivers, so if the object oriented solution was in fact slower we wouldn't know it.

What's wrong with massive switch statements for an OO language?

  • Program control flow is taken away from the object where it belongs and placed outside the object
  • Many points of external control translates into many places you need to review
  • It is unclear where state is stored, particularly if the switch is inside a loop
  • The quickest comparison is no comparison at all (you can avoid the need for many comparisons with a good object oriented design)
  • It's more efficient to iterate through your objects and always call the same method on all of the objects than it is to change your code based on the object type or enum that encodes the type.
5

He is correct that the resulting machine code will probably be more efficient. The compiler essential transforms a switch statement into a set of tests and branches, which will be relatively few instructions. There is a high chance that the code resulting from more abstracted approaches will require more instructions.

HOWEVER: It's almost certainly the case that your particular application doesn't need to worry about this kind of micro-optimisation, or you wouldn't be using .net in the first place. For anything short of very constrained embedded applications, or CPU intensive work you should always let the compiler deal with optimisation. Concentrate on writing clean, maintainable code. This is almost always of far great value than a few tenths of a nano-second in execution time.

Luke Graham
  • 2,403
5

You can't convince me that:

void action1()
{}

void action2()
{}

void action3()
{}

void action4()
{}

void doAction(int action)
{
    switch(action)
    {
        case 1: action1();break;
        case 2: action2();break;
        case 3: action3();break;
        case 4: action4();break;
    }
}

Is significantly faster than:

struct IAction
{
    virtual ~IAction() {}
    virtual void action() = 0;
}

struct Action1: public IAction
{
    virtual void action()    { }
}

struct Action2: public IAction
{
    virtual void action()    { }
}

struct Action3: public IAction
{
    virtual void action()    { }
}

struct Action4: public IAction
{
    virtual void action()    { }
}

void doAction(IAction& actionObject)
{
    actionObject.action();
}

Additionally the OO version is just more maintainable.

Loki Astari
  • 11,190
4

Normally I hate the word, "premature optimization", but this reeks of it. It's worth noting that Knuth used this famous quote in the context of pushing to use goto statements in order to speed up code in critical areas. That's the key: critical paths.

He was suggesting to use goto for speeding up code but warning against those programmers who would want to do these types of things based on hunches and superstitions for code that isn't even critical.

To favor switch statements as much as possible uniformly throughout a codebase (whether or not any heavy load is handled) is the classic example of what Knuth calls the "penny-wise and pound-foolish" programmer who spends all day struggling to maintain their "optimized" code which turned into a debugging nightmare as a result of trying to save pennies over pounds. Such code is rarely maintainable let alone even efficient in the first place.

Is he right?

He is correct from the very basic efficiency perspective. No compiler to my knowledge can optimize polymorphic code involving objects and dynamic dispatch better than a switch statement. You'll never end up with a LUT or jump table to inlined code from polymorphic code, since such code tends to serve as an optimizer barrier for the compiler (it won't know which function to call until the time at which the dynamic dispatch occurs).

It's more useful not to think of this cost in terms of jump tables but more in terms of the optimization barrier. For polymorphism, calling Base.method() doesn't allow the compiler to know which function will actually end up being called if method is virtual, not sealed, and can be overridden. Since it doesn't know which function is actually going to be called in advance, it can't optimize away the function call and utilize more information in making optimization decisions, since it doesn't actually know which function is going to be called at the time the code is being compiled.

Optimizers are at their best when they can peer into a function call and make optimizations that either completely flatten the caller and callee, or at least optimize the caller to most efficiently work with the callee. They can't do that if they don't know which function is actually going to be called in advance.

Is he just talking out his ass?

Using this cost, which often amounts to pennies, to justify turning this into a coding standard applied uniformly is generally very foolish, especially for places that have an extensibility need. That's the main thing you want to watch out for with genuine premature optimizers: they want to turn minor performance concerns into coding standards applied uniformly throughout a codebase with no regard for maintainability whatsoever.

I take a little offense to the "old C hacker" quote used in the accepted answer though, since I'm one of those. Not everyone who has been coding for decades starting from very limited hardware has turned into a premature optimizer. Yet I've encountered and worked with those too. But those types never measure things like branch misprediction or cache misses, they think they know better, and base their notions of inefficiency in a complex production codebase based on superstitions which don't hold true today and sometimes never held true. People who have genuinely worked in performance-critical fields often understand that effective optimization is effective prioritization, and trying to generalize a maintainability-degrading coding standard out to save pennies is very ineffective prioritization.

Pennies are important when you have a cheap function that doesn't do so much work which is called a billion times in a very tight, performance-critical loop. In that case, we end up saving 10 million dollars. It's not worth shaving pennies when you have a function called two times for which the body alone costs thousands of dollars. It's not wise to spend your time haggling over pennies during the purchase of a car. It is worth haggling over pennies if you are purchasing a million cans of soda from a manufacturer. The key to effective optimization is to understand these costs in their proper context. Someone who tries to save pennies on every single purchase and suggests that everyone else tries to haggle over pennies no matter what they're purchasing isn't a skilled optimizer.

3

One major reason to use classes instead of switch statements is that switch statements tends to lead to one huge file that have lots of logic. This is both a maintainance nightmare as well as a problem with source management since you have to check out and edit that huge file instead of a different smaller class files

Homde
  • 11,114
3

a switch statement in OOP code is a strong indiciation of missing classes

try it both ways and run some simple speed tests; chances are the difference are not significant. If they are and the code is time-critical then keep the switch statement

2

One maintainability advantage of polymorphism that no-one has mentioned is that you will be able to structure your code much more nicely using inheritance if you are always switching on the same list of cases, but sometime several cases are handled the same way and sometime they aren't

Eg. if you are switching between Dog, Cat and Elephant, and sometimes Dog and Cat have the same case, you can make them both inherit from an abstract class DomesticAnimal and put those function in the abstract class.

Also, I was surprised that several people used a parser as an example of where you wouldn't use polymorphism. For a tree-like parser this is definitely the wrong approach, but if you have something like assembly, where each line is somewhat independent, and start with an opcode that indicates how the rest of the line should be interpreted, I would totally use polymorphism and a Factory. Each class can implement functions like ExtractConstants or ExtractSymbols. I have used this approach for a toy BASIC interpreter.

jwg
  • 214
2

It sounds like your coworker is very concerned about performance. It might be that in some cases a large case/switch structure will perform faster, but hopefully you guys would do an experiment by doing timing tests on the OO version and the switch/case version. I am guessing the OO version has less code and is easier to follow, understand and maintain. I would argue for the OO version first (as maintenance/readability should be initially more important), and only consider the switch/case version only if the OO version has serious performance issues and it can be shown that a switch/case will make a significant improvement.

1

Your colleague is not talking out of his backside, as far as the comment regarding jump tables goes. However, using that to justify writing bad code is where he goes wrong.

The C# compiler converts switch statements with just a few cases into a series of if/else's, so is no faster than using if/else. The compiler converts larger switch statements into a Dictionary (the jump table that your colleague is referring to). Please see this answer to a Stack Overflow question on the topic for more details.

A large switch statement is hard to read and maintain. A dictionary of "cases" and functions is a lot easier to read. As that is what the switch gets turned into, you and your colleague would be well advised to use dictionaries directly.

David Arno
  • 39,599
  • 9
  • 94
  • 129
1

He's not necessarily talking out of his ass. At least in C and C++ switch statements can be optimized to jump tables while I've never seen it happen with a dynamic dispatch in a function that only has access to a base pointer. At the very least the latter requires a much smarter optimizer looking at much more surrounding code to figure out exactly what subtype is being used from a virtual function call through a base pointer/reference.

On top of that the dynamic dispatch often serves as an "optimization barrier", meaning the compiler often won't be able to inline code and optimally allocate registers to minimize stack spills and all that fancy stuff, since it can't figure out what virtual function is going to be called through the base pointer to inline it and do all of its optimization magic. I'm not sure you even want the optimizer to be so smart and try to optimize away indirect function calls, since that could potentially lead to many branches of code having to be generated separately down a given call stack (a function that calls foo->f() would have to generate totally different machine code from one that calls bar->f() through a base pointer, and the function that calls that function would then have to generate two or more versions of code, and so forth -- the amount of machine code being generated would be explosive -- maybe not so bad with a trace JIT which generates the code on the fly as it's tracing through hot execution paths).

However, as many answers have echoed, that's a bad reason to favor a boatload of switch statements even if it's hands-down faster by some marginal amount. Besides, when it comes to micro-efficiencies, things like branching and inlining are usually pretty low priority compared to things like memory access patterns.

That said, I jumped in here with an unusual answer. I want to make a case for the maintainability of switch statements over a polymorphic solution when, and only when, you know for sure that there is only going to be one place that needs to perform the switch.

A prime example is a central event handler. In that case you generally don't have many places handling events, just one (why it's "central"). For those cases, you don't benefit from the extensibility that a polymorphic solution provides. A polymorphic solution is beneficial when there are many places that would do the analogical switch statement. If you know for sure there's only going to be one, a switch statement with 15 cases can be a whole lot simpler than designing a base class inherited by 15 subtypes with overridden functions and a factory to instantiate them, only to then be used in one function in the entire system. In those cases, adding a new subtype is a lot more tedious than adding a case statement to one function. If anything, I'd argue for the maintainability, not the performance, of switch statements in this one peculiar case where you don't benefit from extensibility whatsoever.

0

Even if this wasn't bad for maintainability, I don't believe that it will be better for performance. A virtual function call is simply one extra indirection (the same as the best case for a switch statement) so even in C++ the performance should be roughly equal. In C#, where all function calls are virtual, the switch statement should be worse, since you have the same virtual function call overhead in both versions.

0

"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil"

Donald Knuth