101

I've been learning some C++, and often have to return large objects from functions that are created within the function. I know there's the pass by reference, return a pointer, and return a reference type solutions, but I've also read that C++ compilers (and the C++ standard) allow for return value optimization, which avoids copying these large objects through memory, thereby saving the time and memory of all of that.

Now, I feel that the syntax is much clearer when the object is explicitly returned by value, and the compiler will generally employ the RVO and make the process more efficient. Is it bad practice to rely on this optimization? It makes the code clearer and more readable for the user, which is extremely important, but should I be wary of assuming the compiler will catch the RVO opportunity?

Is this a micro-optimization, or something I should keep in mind when designing my code?

Robert Harvey
  • 200,592
Matt
  • 1,043

14 Answers14

130

Employ the least astonishment principle.

Is it you and only ever you who is going to use this code, and are you sure the same you in 3 years is not going to be surprised by what you do?

Then go ahead.

In all other cases, use the standard way; otherwise, you and your colleagues are going to run into hard to find bugs.

For example, my colleague was complaining about my code causing errors. Turns out, he had turned off short-circuit Boolean evaluation in his compiler settings. I nearly slapped him.

Robert Harvey
  • 200,592
Pieter B
  • 13,310
82

For this particular case, definitely just return by value.

  • RVO and NRVO are well-known and robust optimizations that really ought to be made by any decent compiler, even in C++03 mode.

  • Move semantics ensure that objects are moved out of functions if (N)RVO didn't take place. That's only useful if your object uses dynamic data internally (like std::vector does), but that should really be the case if it is that big -- overflowing the stack is a risk with big automatic objects.

  • C++17 enforces RVO. So don't worry, it won't disappear on you and will only finish establishing itself completely once compilers are up-to-date.

And in the end, forcing an additional dynamic allocation to return a pointer, or forcing your result type to be default-constructible just so you can pass it as an output parameter are both ugly and non-idiomatic solutions to a problem you will probably never have.

Just write code that makes sense and thank the compiler writers for correctly optimizing code that makes sense.

Quentin
  • 1,475
63

Now, I feel that the syntax is much clearer when the object is explicitly returned by value, and the compiler will generally employ the RVO and make the process more efficient. Is it bad practice to rely on this optimization? It makes the code clearer and more readable for the user, which is extremely important, but should I be wary of assuming the compiler will catch the RVO opportunity?

This isn't some little known, cutesy, micro-optimization that you read about in some small, little trafficked blog and then you get to feel clever and superior about using.

After C++11, RVO is the standard way to write this code of code. It is common, expected, taught, mentioned in talks, mentioned in blogs, mentioned in the standard, will be reported as a compiler bug if not implemented. In C++17, the language goes one step further and mandates copy elision in certain scenarios.

You should absolutely rely on this optimization.

On top of that, return-by-value just leads to massively easier code to read and manage than code that's return by reference. Value semantics is a powerful thing, that itself could lead to more optimization opportunities.

Barry
  • 1,308
18

The correctness of the code you write should never depend on an optimization. It should output the correct result when executed on the C++ "virtual machine" that they use in the specification.

However, what you talk about is more of an efficiency sort of question. Your code runs better if optimized with a RVO optimizing compiler. That's fine, for all the reasons pointed out in the other answers.

However, if you require this optimization (such as if the copy constructor would actually cause your code to fail), now you're at the whims of the compiler.

I think the best example of this in my own practice is tail call optimization:

   int sillyAdd(int a, int b)
   {
      if (b == 0)
          return a;
      return sillyAdd(a + 1, b - 1);
   }

It's a silly example, but it shows a tail call, where a function is called recursively right at the end of a function. The C++ virtual machine will show that this code operates properly, though I may cause a little confusion as to why I bothered writing such an addition routine in the first place. However, in practical implementations of C++, we have a stack, and it has limited space. If done pedantically, this function would have to push at least b + 1 stack frames onto the stack as it does its addition. If I want to calculate sillyAdd(5, 7), this is not a big deal. If I want to calculate sillyAdd(0, 1000000000), I could be in real trouble of causing a StackOverflow (and not the good kind).

However, we can see that once we reach that last return line, we're really done with everything in the current stack frame. We don't really need to keep it around. Tail call optimization lets you "reuse" the existing stack frame for the next function. In this way, we only need 1 stack frame, rather than b+1. (We still have to do all those silly additions and subtractions, but they don't take more space.) In effect, the optimization turns the code into:

   int sillyAdd(int a, int b)
   {
      begin:
      if (b == 0)
          return a;
      // return sillyAdd(a + 1, b - 1);
      a = a + 1;
      b = b - 1;
      goto begin;  
   }

In some languages, tail call optimization is explicitly required by the specification. C++ is not one of those. I cannot rely on C++ compilers to recognize this tail call optimization opportunity, unless I go case-by-case. With my version of Visual Studio, the release version does the tail call optimization, but the debug version does not (by design).

Thus it would be bad for me to depend on being able to calculate sillyAdd(0, 1000000000).

ojdo
  • 131
Cort Ammon
  • 11,917
  • 3
  • 26
  • 35
8

In practice C++ programs are expecting some compiler optimizations.

Look notably into the standard headers of your standard containers implementations. With GCC, you could ask for the preprocessed form (g++ -C -E) and the GIMPLE internal representation (g++ -fdump-tree-gimple or Gimple SSA with -fdump-tree-ssa) of most source files (technically translation units) using containers. You'll be surprised by the amount of optimization which is done (with g++ -O2). So the implementors of containers rely on the optimizations (and most of the time, the implementor of a C++ standard library knows what optimization would happen and write the container implementation with those in mind; sometimes he would also write the optimization pass in the compiler to deal with features required by then standard C++ library).

In practice, it is the compiler optimizations which make C++ and its standard containers efficient enough. So you can rely on them.

And likewise for the RVO case mentioned in your question.

The C++ standard was co-designed (notably by experimenting good enough optimizations while proposing new features) to work well with the possible optimizations.

For instance, consider the program below:

#include <algorithm>
#include <vector>

extern "C" bool all_positive(const std::vector<int>& v) {
  return std::all_of(v.begin(), v.end(), [](int x){return x >0;});
}

compile it with g++ -O3 -fverbose-asm -S. You'll find out that the generated function don't run any CALL machine instruction. So most C++ steps (construction of a lambda closure, its repeated application, getting the begin and end iterators, etc...) have been optimized. The machine code contains only a loop (which does not appear explicitly in the source code). Without such optimizations, C++11 won't be successful.

addenda

(added december 31st 2017)

See CppCon 2017: Matt Godbolt “What Has My Compiler Done for Me Lately? Unbolting the Compiler's Lid” talk.

4

Whenever you use a compiler, the understanding is that it will produce machine- or byte-code for you. It does not guarantee anything about what that generated code is like, except that it will implement the source code according to the specification of the language. Note that this guarantee is the same regardless of the level of optimization used, and so, in general, there is no reason to regard one output as more 'right' than the other.

Furthermore, in those cases, like RVO, where it is specified in the language, it would seem to be pointless to go out of your way to avoid using it, especially if it makes the source code simpler.

A lot of effort is put into making compilers produce efficient output, and clearly the intent is for those capabilities to be used.

There may be reasons for using unoptimized code (for debugging, for example), but the case mentioned in this question does not appear to be one (and if your code fails only when optimized, and it is not a consequence of some peculiarity of the device you are running it on, then there is a bug somewhere, and it is unlikely to be in the compiler.)

sdenham
  • 253
3

I think others covered the specific angle about C++ and RVO well. Here is a more general answer:

When it comes to correctness, you should not rely on compiler optimizations, or compiler-specific behavior in general. Fortunately, you don't seem to be doing this.

When it comes to performance, you have to rely on compiler-specific behavior in general, and compiler optimizations in particular. A standard-compliant compiler is free to compile your code in any way it wants to, as long as the compiled code behaves according to the language specification. And I'm not aware of any specification for a mainstream language that specifies how fast each operation has to be.

svick
  • 10,137
  • 1
  • 39
  • 53
2

No.

That's what I do all the time. If I need to access an arbitrary 16 bit block in memory, I do this

void *ptr = get_pointer();
uint16_t u16;
memcpy(&u16, ptr, sizeof(u16)); // ntohs omitted for simplicity

...and rely on the compiler doing whatever it can to optimize that piece of code. The code works on ARM, i386, AMD64, and practically on every single architecture out there. In theory, a non-optimizing compiler could actually call memcpy, resulting in totally bad performance, but that is no problem for me, as I use compiler optimizations.

Consider the alternative:

void *ptr = get_pointer();
uint16_t *u16ptr = ptr;
uint16_t u16;
u16 = *u16ptr;  // ntohs omitted for simplicity

This alternative code fails to work on machines that require proper alignment, if get_pointer() returns a non-aligned pointer. Also, there may be aliasing issues in the alternative.

The difference between -O2 and -O0 when using the memcpy trick is great: 3.2 Gbps of IP checksum performance versus 67 Gbps of IP checksum performance. Over an order of magnitude difference!

Sometimes you may need to help the compiler. So, for example, instead of relying on the compiler to unroll loops, you can do it yourself. Either by implementing the famous Duff's device, or by a more clean way.

The drawback of relying on the compiler optimizations is that if you run gdb to debug your code, you may find out that a lot has been optimized away. So, you may need to recompile with -O0, meaning performance will totally suck when debugging. I think this is a drawback worth taking, considering the benefits of optimizing compilers.

Whatever you do, please make sure your way is actually not undefined behaviour. Certainly accessing some random block of memory as 16-bit integer is undefined behaviour due to aliasing and alignment issues.

juhist
  • 2,579
  • 12
  • 14
1

Compiler optimizations should only effect performance, not results. Relying upon compiler optimizations to meet non functional requirements is not only reasonable it is frequently the reason why one compiler is picked over another.

Flags that determine how particular operations are performed (index or overflow conditions for example), are frequently lumped in with compiler optimizations, but shouldn't be. They explictly effect the results of calculations.

If a compiler optimization causes different results, that is a bug -- a bug in the compiler. Relying upon a bug in the compiler, is in the long term a mistake -- what happens when it gets fixed?

Using compiler flags that change how calculations work should be well documented, but used as needed.

jmoreno
  • 11,238
1

All attempts at efficient code written in anything but assembly relies very, very heavily on compiler optimizations, starting with the most basic like efficient register allocation to avoid superfluous stack spills all over the place and at least reasonably good, if not excellent, instruction selection. Otherwise we'd be back to the 80s where we had to put register hints all over the place and use the minimum number of variables in a function to help archaic C compilers or even earlier when goto was a useful branching optimization.

If we didn't feel like we could rely on our optimizer's ability to optimize our code, we'd all still be coding performance-critical execution paths in assembly.

It's really a matter of how reliably you feel the optimization can be made which is best sorted out by profiling and looking into the capabilities of the compilers you have and possibly even disassembling if there's a hotspot you can't figure out where the compiler seems to have failed to make an obvious optimization.

RVO is something that has been around for ages, and, at least excluding very complex cases, is something compilers have been reliably applying well for ages. It's definitely not worth working around a problem that doesn't exist.

Err on the Side of Relying on the Optimizer, Not Fearing It

To the contrary, I'd say err on the side of relying too much on compiler optimizations than too little, and this suggestion is coming from a guy who works in very performance-critical fields where efficiency, maintainability, and perceived quality among customers is all one giant blur. I'd rather have you rely too confidently on your optimizer and find some obscure edge cases where you relied too much than relying too little and just coding out of superstitious fears all the time for the rest of your life. That'll at least have you reaching for a profiler and investigating properly if things don't execute as quickly as they should and gaining valuable knowledge, not superstitions, along the way.

You're doing well to lean on the optimizer. Keep it up. Don't become like that guy that starts explicitly requesting to inline every function called in a loop before even profiling out of a misguided fear of the optimizer's shortcomings.

Profiling

Profiling is really the roundabout but ultimate answer to your question. The problem beginners eager to write efficient code often struggle with is not what to optimize, it's what not to optimize because they develop all kinds of misguided hunches about inefficiencies that, while humanly intuitive, are computationally wrong. Developing experience with a profiler will start really giving you a proper appreciation of not only your compilers' optimization capabilities which you can confidently lean on, but also the capabilities (as well as the limitations) of your hardware. There's arguably even more value in profiling in learning what wasn't worth optimizing than learning what was.

-1

Software can be written in C++ on very different platforms and for lots of different purposes.

It completely depends on the purpose of the software. Should it be easy to maintain, expand, patch, refactor et.c. or are other things more important, like performance, cost or compatibility with some specific hardware or the time it takes to develop.

-2

I think the boring answer to this is: 'it depends'.

Is it bad practice to write code that relies on a compiler optimization that is likely to be turned off and where the vulnerability is not documented and where the code in question is not unit tested so that if it did break you'd know it? Probably.

Is it bad practice to write code that relies on a compiler optimization that is not likely to be turned off, that is documented and is unit tested? Maybe not.

-6

Unless there is more you are not telling us, this is bad practice, but not for the reason you suggest.

Possibly unlike other languages you have used before, returning the value of an object in C++ yields a copy of the object. If you then modify the object, you are modifying a different object. That is, if I have Obj a; a.x=1; and Obj b = a;, then I do b.x += 2; b.f();, then a.x still equals 1, not 3.

So no, using an object as a value instead of as a reference or pointer does not provide the same functionality and you could end up with bugs in your software.

Perhaps you know this and it does not negatively affect your specific use case. However, based on the wording in your question, it appears that you might not be aware of the distinction; wording such as "create an object in the function."

"create an object in the function" sounds like new Obj; where "return the object by value" sounds like Obj a; return a;

Obj a; and Obj* a = new Obj; are very, very different things; the former can result in memory corruption if not properly used and understood, and the latter can result in memory leaks if not properly used and understood.

Aaron
  • 230
-7

Pieter B is absolutely correct in recommending least astonishment.

To answer your specific question, what this (most likely) means in C++ is that you should return a std::unique_ptr to the constructed object.

The reason is that this is clearer for a C++ developer as to what's going on.

Although your approach would most probably work, you're effectively signalling that the object is a small value type when, in fact, it isn't. On top of that, you're throwing away any possibility for interface abstraction. This may be OK for your current purposes but is often very useful when dealing with matrices.

I appreciate that if you've come from other languages, all the sigils can be confusing initially. But be careful not to assume that, by not using them, you make your code clearer. In practice, the opposite is likely to be true.

Alex
  • 3,942