A good friend asked me how functions work under the covers, so I wrote up this quick explanation. Corners are cut for brevity, but I hope it will convey the general idea.
The three assembly language programs below implement the following (super simple) recursive function written in scheme:
(define (function x) (if (> x 5) x (function (+ 1 x))))
Naive, literal version
In assembly language, a label is a placeholder for the offset into the code of the instruction that follows that label. Our function starts at the label _function and the way the code continues down the page represents exactly how it will be loaded into memory.
For example, the code of the first section of our function is laid out like this in the binary the assembler generates:
$ otool -t foo 0000000100000f50 55 48 89 e5 48 83 ec 10 89 7d f8 83 7d f8 05 0f 0000000100000f60 8e 0b 00 00 00 8b 45 f8 89 45 fc e9 10 [ ... ]
And the label _function means, relative to these addresses, 0000000100000f50. Given this sort of hex dump, it’s tedious but not difficult to disassemble the binary manually by looking up the op codes in a table like this. It is, however, more convenient to have the machine do it for us:
$ otool -t -v foo _function: 0000000100000f50 pushq %rbp 0000000100000f51 movq %rsp, %rbp 0000000100000f54 subq $16, %rsp 0000000100000f58 movl %edi, -8(%rbp) 0000000100000f5b cmpl $5, -8(%rbp) 0000000100000f5f jle 0x100000f70 0000000100000f65 movl -8(%rbp), %eax 0000000100000f68 movl %eax, -4(%rbp) 0000000100000f6b jmpq 0x100000f80 [ ... ]
Which reveals, as promised, that the binary layout reflects the assembly language source.
The first few lines following the _function label are standard bookkeeping involving the call stack. The %rbp is the “base pointer” and the %rsp is the “stack pointer”. This ritual of pushing the base pointer onto the call stack is how the computer knows where to return from a function. The stack is also how functions communicate arguments and return values, about which more here.
After that we perform a comparison operation against the constant 5, and then “jump if less than” (jle) to our _recurse label, which calls the function again, copies its return value and then falls through to the _return label, which is boilerplate stack cleanup code (the inverse of what we did on function entry). Otherwise, we carry on to lines 12-14 of _function, which copy the return value into place and then jump (jmp is goto in assembly language) to the same _return label to perform the aforementioned boilerplate.
Under UNIX calling conventions the exit code of a process is whatever main returns, which in this case is what our recursive function returned to it. Bash stores the exit code of the last command in the variable $?, thus allowing us to quickly test our code:
$ cc -o foo function.s $ ./foo ; echo $? 6
Yep, six is greater than five.
A few words on optimization
A second version, below, implements the same function using Tail Call Optimization, a clever performance hack where the compiler figures out that a function is isomorphic to a loop and rewrites it as one. Rather than incurring the overhead of pushing and popping the stack every time we want to increment our counter (and thus the danger that we might run out of stack space), we use what amounts to a while loop implemented on the %eax register. (Note to former Apple II/Commodore 64 owners: %eax is the “extended accumulator”, a 32-bit cousin of the 6502’s accumulator register).
_function: pushq %rbp movq %rsp, %rbp movl %edi, -4(%rbp) movl -4(%rbp), %eax _while: addl $1, %eax cmpl $5, %eax jle _while popq %rbp ret .globl _main _main: pushq %rbp movq %rsp, %rbp subq $16, %rsp movl $0, %eax movl %eax, %edi callq _function addq $16, %rsp popq %rbp ret
The third and final version combines TCO with inlining, which is a technique where the compiler (in this case, me) determines that the function can be included literally at the point at which it’s called (the “call site”). Here we have no stack overhead at all, just the same single-register while loop embedded in the _main function.
.globl _main _main: pushq %rbp movq %rsp, %rbp movl $0, %eax _while: addl $1, %eax cmpl $5, %eax jle _while popq %rbp ret
To do much better than this, one would need semantic analysis to determine whether the function is used and if that use changes anything in the output of the program.
This entry is part of Jack Rusher’s archive, originally published September 2nd, 2013, in Berlin.