4

In something as simple as

int sq(int x) { int y = x*x; return y; }

versus

int sq(int x) { return (x*x); }

the former function requires an extra IL step.

EDIT: IL code from the first example, using the current C# compiler from VS 2013 Update 4 (.NET Framework 4.5.1), code from the optimized release version:

  IL_0000:  ldarg.0
  IL_0001:  ldarg.0
  IL_0002:  mul
  IL_0003:  stloc.0   // These two instructions
  IL_0004:  ldloc.0   // seem to be unneccessary
  IL_0005:  ret

and from the second example:

  IL_0000:  ldarg.0
  IL_0001:  ldarg.0
  IL_0002:  mul
  IL_0003:  ret

Is that IL extra step indicative of data being copied in memory? Is this difference still present when using something seemingly better optimized? (C?)

If yes to first or both: can I use the latter, especially when the returned data takes a lot of space, and presumably a lot of time to be moved in memory? Or should I favor the former, because it is more readable? LINQ makes it easy to have whole functions inside a long return() line, but my example faces the same choice were it C. Speed or readability?

In general, would you put the last bit of maths between the parentheses in the return line?

Doc Brown
  • 218,378
nvja
  • 167

4 Answers4

15

Any self respecting compiler will compile both examples to the same IL. Further, the first isn't any more readable, since you have a garbage variable with no information. It's just noise which decreases readability. If that temporary actually had a good name that provided some useful information to the reader, then the first way would (usually) be better.

But in general, favor readability over speed (which is only barely correlated to length of code).

Telastyn
  • 110,259
13

First of all, micro-optimizations like this rarely make sense. You should focus on readability and worry about these kinds of optimizations only when you've identified the code that you actually need to optimize by profiling.


Now, if you're still interested about performance, looking at IL doesn't make much sense, because it's not IL that's executed.

Instead, what you should do is to measure.

Another option would be do look at the generated machine code (usually x86). But the problem with that is that CPUs are very complicated and it can be very difficult to figure out which version of the code is going to be faster.


Note: looking at the machine code is not as simple as opening the Disassembly window. You need to run in the Release mode, and make sure the CLR actually does optimize the code (either by unchecking "Supress JIT optimization on module load" or by starting without the debugger attached: Ctrl+F5, and attaching it later e.g. using Debugger.Break() and selecting to debug the code).

If you do that, you will most likely notice that both versions of the method are going to be inlined, which makes it hard to see which instructions belong to your method (but both versions generate the same machine code for me).

If you want to compare the two versions of the method in isolation, you can use [MethodImpl(MethodImplOptions.NoInlining)]. With that, the code I'm getting for both versions of the method is the same:

push        ebp  
mov         ebp,esp  
mov         eax,edx  
imul        eax,edx  
pop         ebp  
ret  
svick
  • 10,137
  • 1
  • 39
  • 53
4

The IL that you're looking at isn't necessarily indicative of the native code that the JIT compiler will generate at runtime (or that ngen will generate if you pre-compile your native binaries).

The IL in this case is obviously remaining somewhat representative of the original code.

EDIT: The comments posted after the OP posted the IL and disassembly have noted that the assembly is a debug build, so optimizations are turned off, and the JIT will likely optimize this away if you compile a release build.

EDIT: And I definitely agree with the comments that debugging the "assign to variable, return the variable" variant is easier, for the simple fact that you can inspect the result of the assignment before the method returns.

However, every compiler (C#, VB.NET, FORTRAN, COBOL, F#, etc.) is a different beast, transforming those languages into common IL, which is then transformed by the JIT into native code for the CPU the CLR finds itself running on.

IL is an acronym for "Intermediate Language," after all, not for "Fully Optimized Native Machine Language." I'd imagine the return on investment from creating fully optimized IL when the JIT itself is already an optimizing compiler may not be worth it in most cases.

Also, it might be interesting if you would post the actual IL from each example.

I guess I wouldn't really worry about it.

It's probably a rare occurrence that you're going to outwit the optimizations built into the compiler. Those optimizations are what those guys (and/or gals) are focusing on all the time.

I'd worry more about correct, human-readable code that's easy to debug, and the 80/20 rule. 80% of the performance issues are going to be in 20% of the code. Actually, it's probably more like 95/5.

Profile your code. Run tests and time them, then optimize the parts that are actually problematic. I'd bet you lunch that the problem bits are often in places you don't expect, and a lot of code you thought was going to be a performance issue just won't be, because it runs rarely or isn't in a super-critical path. And I'll bet the JIT optimizes away whatever you're worried about with this example, anyway. ;-)

Craig Tullis
  • 1,985
1

I'm going to directly challenge that the first example is more readable. In this code, it is very clear what is happening, you take an int x and return x*x.

int sq(int x) {
    return (x*x);
}

Adding an extra instruction (y=x*x; return y) adds nothing to readability, and in fact makes me wonder what I might be missing (eg. were there lines you've removed), because you are assigning a new variable.


However, in the general "readability vs. speed" argument, I would suggest readability trumps all others. In most cases, programmer time is vastly more expensive than computation or storage. CodeGolf.StackExchange is an exemplary example of this, as it is filled with alot of very fast, very clever solutions to problems, but the time to actually understand how some of them work is quite long.

"Bottlenecks occur in surprising places, so don't try to second guess and put in a speed hack until you have proven that's where the bottleneck is." — Rob Pike

Optimisations in code to improve bytecode performance should only be considered after every other part of the code is profiled to identify potential bottlenecks. If your bytecode optimisation makes a naive O(n!) operation a little faster for each n, finding a O(n^2) solution in code will grant much better results.

Even then, when you start investigating bytecode operation to squeeze extra performance, the cost of adding an extra core to a server, or buying a little extra server time, might be minuscule compared to the time required for code changes and investigations for a bytecode optimal solution - the reason I say this is that if a bytecode optimal solution were trivially easy, it would already be incorporated into the compiler/interpreter.