Most architectures I've seen rely on a call stack to save/restore context before function calls. It's such a common paradigm that push and pop operations are built-in to most processors. Are there systems that work without a stack? If so, how do they work, and what are they used for?
7 Answers
A (somewhat) popular alternative to a call stack are continuations.
The Parrot VM is continuation-based, for example. It is completely stackless: data is kept in registers (like Dalvik or the LuaVM, Parrot is register-based), and control-flow is represented with continuations (unlike Dalvik or the LuaVM, which have a call stack).
Another popular data structure, used typically by Smalltalk and Lisp VMs is the spaghetti stack, which is kind-of like a network of stacks.
As @rwong pointed out, continuation-passing style is an alternative to a call stack. Programs written in (or transformed to) continuation-passing style never return, so there is no need for a stack.
Answering your question from a different perspective: it is possible to have a call stack without having a separate stack, by allocating the stack frames on the heap. Some Lisp and Scheme implementations do this.
- 104,619
In the olden days, processors didn't have stack instructions, and programming languages didn't support recursion. Over time, more and more languages choose to support recursion, and hardware followed suite with stack frame allocation capabilities. This support has varied greatly over the years with different processors. Some processors adopted stack frame and/or stack pointer registers; some adopted instructions that would accomplish the allocation of stack frames in a single instruction.
As processors advanced with single level, then multi-level caches, one critical advantage of the stack is that of cache locality. The top of the stack is almost always in the cache. Whenever you can do something that has a large cache hit rate, you're on the right track with modern processors. The cache applied to the stack means that local variables, parameters, etc.. are almost always in the cache, and enjoy the highest level of performance.
In short, the usage of the stack evolved both in hardware and software. There are other models (for example data flow computing was tried for an extended period), however, locality of the stack makes it work really well. Further, procedural code is just what processors want, for performance: one instruction telling it what to do after another. When the instructions are out of linear order, the processor slows down tremendously, at least as yet, since we haven't figured out how to make random access as fast as sequential access. (Btw, there are similar issues at each memory level, from cache, to main memory, to disc...)
Between the demonstrated performance of sequential access instructions and the beneficial caching behavior of the call stack, we have, at least at present, a winning performance model.
(We might throw mutability of data structures into the works as well...)
This doesn't mean that other programming models can't work, especially when they can be translated into the sequential instructions and call stack model of today's hardware. But there is a distinct advantage for models that support where the hardware is at. However, things don't always stay the same, so we could see changes in the future as different memory and transistor technologies allow for more parallelism. It is always a banter between programming languages and hardware capabilities, so, we'll see!
- 34,819
TL;DR
- Call stack as a function call mechanism:
- Is typically simulated by hardware but is not fundamental to the construction of hardware
- Is fundamental to imperative programming
- Is not fundamental to functional programming
- Stack as an abstraction of "last-in, first-out" (LIFO) is fundamental to computer science, algorithms, and even some non-technical domains.
- Some examples of program organization that do not use call stacks:
- Continuation-passing style (CPS)
- State machine - a giant loop, with everything inlined. (Purported to be inspired by the Saab Gripen firmware architecture, and attributed to a communication by Henry Spencer and reproduced by John Carmack.) (Note #1)
- Dataflow architecture - a network of actors connected by queues (FIFO). The queues are sometimes called channels.
The rest of this answer is a random collection of thoughts and anecdotes, and therefore somewhat disorganized.
The stack that you have described (as a function call mechanism) is specific to imperative programming.
Below imperative programming, you will find machine code. Machine code can emulate the call stack by executing a small sequence of instructions.
Below machine code, you will find the hardware responsible for executing software. While modern microprocessor is too complex to be described here, one can imagine that a very simple design exists that is slow but is still capable of executing the same machine code. Such a simple design will make use of the basic elements of digital logic:
- Combinational logic, i.e. a connection of logic gates (and, or, not, ...) Note that "combinational logic" excludes feedbacks.
- Memory, i.e. flip-flops, latches, registers, SRAM, DRAM, etc.
- A state machine that consists of some combinational logic and some memory, just enough so that it can implement a "controller" that manages the rest of the hardware.
The following discussions contained plenty of examples of alternative ways of structuring imperative programs.
- http://number-none.com/blow/john_carmack_on_inlined_code.html
- https://news.ycombinator.com/item?id=8374345
The structure of such as program will look like this:
void main(void)
{
do
{
// validate inputs for task 1
// execute task 1, inlined,
// must complete in a deterministically short amount of time
// and limited to a statically allocated amount of memory
// ...
// validate inputs for task 2
// execute task 2, inlined
// ...
// validate inputs for task N
// execute task N, inlined
}
while (true);
// if this line is reached, tell the programmers to prepare
// themselves to appear before an accident investigation board.
return 0;
}
This style would be appropriate for microcontrollers, i.e. for those who see the software as an companion to the functions of the hardware.
- 17,140
You've got some good answers so far; let me give you an impractical but highly educational example of how you could design a language without the notion of stacks or "control flow" at all. Here's a program that determines factorials:
function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = f(3)
We put this program in a string, and we evaluate the program by textual substitution. So when we are evaluating f(3), we do a search and replace with 3 for i, like this:
function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = if 3 == 0 then 1 else 3 * f(3 - 1)
Great. Now we perform another textual substitution: we see that the condition of the "if" is false and do another string replace, producing the program:
function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = 3 * f(3 - 1)
Now we do another string replace on all sub-expressions involving constants:
function f(i) => if i == 0 then 1 else i * f(i - 1)
let x = 3 * f(2)
And you see how this goes; I won't labour the point further. We could keep on doing a series of string substitutions until we got down to let x = 6 and we'd be done.
We use the stack traditionally for local variables and continuation information; remember, a stack doesn't tell you where you came from, it tells you where you're going next with that return value in hand.
In the string substitution model of programming, there are no "local variables" on the stack; the formal parameters are substituted for their values when the function is applied to its argument, rather than put into a lookup table on the stack. And there is no "going somewhere next" because program evaluation is simply applying simple rules for string substitution to produce a different but equivalent program.
Now, of course, actually doing string substitutions is probably not the way to go. But programming languages that support "equational reasoning" (such as Haskell) are logically using this technique.
- 46,558
No, not necessarily.
Read Appel's old paper Garbage Collection can be faster than Stack Allocation. It uses continuation passing style and shows a stackless implementation.
Notice also that old computer architectures (e.g. IBM/360) did not have any hardware stack register. But the OS and compiler reserved a register for the stack pointer by convention (related to calling conventions) so that they could have a software call stack.
In principle, a whole program C compiler and optimizer could detect the case (somewhat common for embedded systems) where the call graph is statically known and without any recursion (or function pointers). In such a system, each function could keep its return address in a fixed static location (and that was how Fortran77 worked in 1970 era computers).
These days, processors also have call stacks (and call & return machine instructions) aware of CPU caches.
- 196
- 4
- 32,862
Since the publication by Parnas in 1972 of On the criteria to be used in decomposing systems into modules it has been reasonably accepted that information hiding in software is a good thing. This follows on a long debate throughout the '60s on structural decomposition and modular programming.
Modularity
A necessary result of black-box relationships between modules implemented by different groups in any multi-threaded system requires a mechanism to permit reentrancy and a means to track the dynamic call-graph of the system. Controlled flow of execution has to pass both into and out of multiple modules.
Dynamic scoping
As soon as lexical scoping is insufficient to track dynamic behavior, then some runtime bookkeeping is required to track the difference.
Given any thread (by definition) has only a single current instruction pointer, a LIFO-stack is appropriate to track each invocation.
Exceptions
So, while the continuation model does not maintain a data structure explicitly for the stack, there is still the nested calling of modules that has to be maintained somewhere!
Even declarative languages either maintain evaluation history, or conversely flatten the execution plan for performance reasons and maintain progress in some other way.
The endless loop structure identified by rwong is common in high-reliability applications with static scheduling that disallows many common programming structures but demands that the entire application be considered a white box with no significant information hiding.
Multiple concurrent endless loops do not require any structure to hold return addresses as they do not call functions, making the question moot. If they communicate using shared variables, then these can easily degenerate into legacy Fortran-style return address analogues.
All the old mainframes (IBM System/360) had no notion of a stack at all. On the 260, for example, parameters were constructed in a fixed location in memory and when a subroutine was called, it was called with R1 pointing to the parameter block and R14 containing the return address. The called routine, if it wanted to call another subroutine, would have to store R14 in a known location before making that call.
This is much more reliable than a stack because everything can be stored in fixed memory locations established at compile time and it can be 100% guaranteed that processes will never run out of stack. There is none of the "Allocate 1MB and cross your fingers" that we have to do nowadays.
Recursive subroutine calls were allowed in PL/I by specifying the keyword RECURSIVE. They meant that the memory used by the subroutine was dynamically rather than statically allocated. But recursive calls were as rare then as they are now.
Stackless operation also makes massive multi-threading much easier, which is why attempts are often made to make modern languages stalkless. There is no reason at all, for example, why a C++ compiler could not be back-end modified to use dynamically allocated memory rather than stacks.
- 1,296