Why do programs use call stacks, if nested function calls can be inlined?

Question

Why not have the compiler take a program like this:

function a(b) { return b^2 };
function c(b) { return a(b) + 5 };

and convert it into a program like this:

function c(b) { return b^2 + 5 };

thereby eliminating the computer's need to remember c(b)'s return address?

I suppose the increased hard disk space and RAM needed to store the program and support its compilation (respectively) is the reason why we use call stacks. Is that correct?

score 76 · Accepted Answer · edited Jun 15 '15 at 08:06

This is called "inlining" and many compilers do this as an optimization strategy in cases where it makes sense.

In your particular example, this optimization would save both space and execution time. But if the function was called in multiple places in the program (not uncommon!), it would increase code size, so the strategy becomes more dubious. (And of course if a function called itself directly or indirectly it would be impossible to inline, since then the code would become infinite in size.)

And obviously it is only possible for "private" functions. Functions which are exposed for external callers cannot be optimized away, at least not in languages with dynamic linking.

score 51 · Answer 2 · 2015-06-12T19:29:10.160

There are two parts to your question: Why have multiple functions at all (instead of replacing function calls with their definition) and why implement those functions with call stacks instead of statically allocating their data somewhere else?

The first reason is recursion. Not just the "oh let's make a new function call for every single item in this list" kind, also the modest kind where you have maybe two calls of a function active at the same time, with many other functions in between them. You need to put local variables on a stack to support this, and you can't inline recursive functions in general.

Then there's a problem for libraries: You don't know which functions will be called from where and how often, so a "library" could never really be compiled, only shipped to all clients in some convenient high-level format that is then be inlined into the application. Aside from other problems with this, you completely lose dynamic linking with all its advantages.

Additionally, there are many reasons to not inline functions even when you could:

It's not necessarily faster. Setting up the stack frame and tearing it down are maybe a dozen single-cycle instructions, for many large or looping functions that's not even 0.1% of their execution time.
It may be slower. Duplication of code has costs, e.g., it will put more pressure in the instruction cache.
Some functions are very large and called from many places, inlining them everywhere increases binary far beyond what's reasonable.
Compilers often have a hard time with very large functions. Everything else being equal, a function of size 2*N takes more than 2*T time where a function of size N takes T time.

score 17 · Answer 3 · answered Jun 13 '15 at 04:34

Stacks allow us to elegantly bypass the limits imposed by the finite number of registers.

Imagine having exactly 26 globals "registers a-z" (or even having only the 7 byte-sized registers of the 8080 chip) And every function you write in this app shares this flat list.

A naive start would be to allocate the first few registers to the first function, and knowing that it took only 3, start with "d" for the second function... You run out quickly.

Instead, if you have a metaphorical tape, like the turing machine, you could have each function start a "call another function" by saving all the variables it's using and forward() the tape, and then the callee function can muddle with as many registers as it wants. When the callee is finished, it returns control to the parent function, who knows where to snag the callee's output as needed, and then plays the tape backwards to restore its state.

Your basic call frame is just that, and are created and dropped by standardized machine code sequences the compiler puts in around the transitions from one function to another. (It's been a long time since I had to remember my C stack frames, but you can read up on the various ways the duties of who drops what at X86_calling_conventions.)

(recursion is awesome, but if you'd ever had to juggle registers without a stack, then you'd really appreciate stacks.)

I suppose the increased hard disk space and RAM needed to store the program and support its compilation (respectively) is the reason why we use call stacks. Is that correct?

While we can inline more these days, ("more speed" is always good; "fewer kb of assembly" means very little in a world of video streams) The main limitation is in the compiler's ability to flatten across certain types of code patterns.

For example, polymorphic objects -- if you don't know the one and only type of object you'll be handed, you can't flatten; you have to look at the object's vtable of features and call through that pointer... trivial to do at runtime, impossible to inline at compile time.

A modern toolchain can happily inline a polymorphically-defined function when it has flattened enough of the caller(s) to know exactly which flavor of obj is:

class Base {
    public: void act() = 0;
};
class Child1: public Base {
    public: void act() {};
};
void ActOn(Base* something) {
    something->act();
}
void InlineMe() {
    Child1 thingamabob;
    ActOn(&thingamabob);
}

in the above, the compiler can choose to keep right on statically inlining, from InlineMe through whatever's inside act(), nor a need to touch any vtables at runtime.

But any uncertainty in what flavor of object will leave it as a call to a discrete function, even if some other invocations of the same function are inlined.

score 11 · Answer 4 · edited May 23 '17 at 12:40

Cases which that approach cannot handle:

function fib(a) { if(a>2) return fib(a-1)+fib(a-2); else return 1; }

function many(a) { for(i = 1 to a) { b(i); };}

There are languages and platforms with limited or no call stacks. PIC microprocessors have a hardware stack limited to between 2 and 32 entries. This creates design constraints.

COBOL bans recursion: https://stackoverflow.com/questions/27806812/in-cobol-is-it-possible-to-recursively-call-a-paragraph

Imposing a ban on recursion does mean that you can represent the entire callgraph of the program statically as a DAG. Your compiler could then emit one copy of a function for each place from which it's called with a fixed jump instead of a return. No stack required, just more program space, potentially quite a lot for complex systems. But for small embedded systems this means you can guarantee not to have a stack overflow at runtime, which would be bad news for your nuclear reactor / jet turbine / car throttle control etc.

Basile Starynkevitch · Answer 5 · 2015-06-13T14:42:37.047

You want function inlining, and most (optimizing) compilers are doing that.

Notice that inlining requires the called function to be known (and is effective only if that called function is not too big), since conceptually it is substituting the call by the rewriting of the called functgion. So you generally cannot inline an unknown function (e.g. a function pointer -and that includes functions from dynamically linked shared libraries-, which is perhaps visible as a virtual method in some vtable; but some compilers might sometimes optimize thru devirtualization techniques). Of course it is not always possible to inline recursive functions (some clever compilers might use partial evaluation and in some cases be able to inline recursive functions).

Notice also the inlining, even when it is easily possible, is not always effective: you (actually your compiler) could increase so much the code size that CPU caches (or branch predictor) would work less efficiently, and that would make your program run slower.

^{I am a bit focusing on functional programming style, since you tagged your qestion as such.}

Notice that you don't need to have any call stack (at least in the machine sense of the "call stack" expression). You could use only the heap.

So, take a look at continuations and read more about continuation passing style (CPS) and CPS transformation (intuitively, you could use continuation closures as reified "call frames" allocated in the heap, and they are sort-of mimicking a call stack; then you need an efficient garbage collector).

Andrew Appel wrote a book Compiling with Continuations and an old paper garbage collection can be faster than stack allocation. See also A.Kennedy's paper (ICFP2007) Compiling with Continuations, Continued

I also recommend reading Queinnec's Lisp In Small Pieces book, which has several chapters related to continuation & compilation.

Notice also that some languages (e.g. Brainfuck) or abstract machines (e.g. OISC, RAM) don't have any calling facilities but are still Turing-complete, so you don't (in theory) even need any function call mechanism, even if it is extremely convenient. BTW, some old instruction set architectures (e.g. IBM/370) don't even have a hardware call stack, or a pushing call machine instruction (the IBM/370 had only a Branch and Link machine instruction)

At last, if your entire program (including all the needed libraries) does not have any recursion you could store the return address (and the "local" variables, which are actually becoming static) of each function in static locations. Old Fortran77 compilers did that in the early 1980s (so the compiled programs did not use any call stack at that time).

Zenilogix · Answer 6 · 2015-06-18T20:05:53.293

Inlining (replacing function calls with equivalent functionality) works well as an optimization strategy for small simple functions. The overhead of a function call can be effectively traded off for a small penalty in added program size (or in some cases, no penalty at all).

However, large functions which in turn call other functions could lead to an enormous explosion in program size if everything was inlined.

The whole point of callable functions is to facilitate efficient re-use, not just by the programmer, but by the machine itself, and that includes properties like reasonable memory or on-disk footprint.

For what it's worth: you can have callable functions without a call stack. For example: IBM System/360. When programming in languages such as FORTRAN on that hardware, the program counter (return address) would be saved into a small section of memory reserved just ahead of the function entry point. It allows for re-useable functions, but does not allow for recursion or multi-threaded code (an attempt at a recursive or re-entrant call would result in a previously saved return address being overwritten).

As explained by other answers, stacks are good things. They facilitate recursion and multi-threaded calls. While any algorithm coded to use recursion could be coded without relying on recursion, the result may be more complex, more difficult to maintain, and may be less efficient. I'm not sure a stack-less architecture could support multi-threading at all.

Why do programs use call stacks, if nested function calls can be inlined?

6 Answers6