Functions, how do they work?

A good friend asked me how functions work under the covers, so I wrote up this quick explanation. Corners are cut for brevity, but I hope it will convey the general idea.

The three assembly language programs below implement the following (super simple) recursive function written in scheme:

(define (function x)
  (if (> x 5)
      x
      (function (+ 1 x))))

Naive, literal version

In assembly language, a label is a placeholder for the offset into the code of the instruction that follows that label. Our function starts at the label _function and the way the code continues down the page represents exactly how it will be loaded into memory.

For example, the code of the first section of our function is laid out like this in the binary the assembler generates:

$ otool -t foo
0000000100000f50 55 48 89 e5 48 83 ec 10 89 7d f8 83 7d f8 05 0f
0000000100000f60 8e 0b 00 00 00 8b 45 f8 89 45 fc e9 10
[ ... ]

And the label _function means, relative to these addresses, 0000000100000f50. Given this sort of hex dump, it's tedious but not difficult to disassemble the binary manually by looking up the op codes in a table like this. It is, however, more convenient to have the machine do it for us:

$ otool -t -v foo
_function:
0000000100000f50  pushq	%rbp
0000000100000f51	movq	%rsp, %rbp
0000000100000f54	subq	$16, %rsp
0000000100000f58	movl	%edi, -8(%rbp)
0000000100000f5b	cmpl	$5, -8(%rbp)
0000000100000f5f	jle	0x100000f70
0000000100000f65	movl	-8(%rbp), %eax
0000000100000f68	movl	%eax, -4(%rbp)
0000000100000f6b	jmpq	0x100000f80
[ ... ]

Which reveals, as promised, that the binary layout reflects the assembly language source.

The first few lines following the _function label are standard bookkeeping involving the call stack. The %rbp is the "base pointer" and the %rsp is the "stack pointer". This ritual of pushing the base pointer onto the call stack is how the computer knows where to return from a function. The stack is also how functions communicate arguments and return values, about which more here.

After that we perform a comparison operation against the constant 5, and then "jump if less than" (jle) to our _recurse label, which calls the function again, copies its return value and then falls through to the _return label, which is boilerplate stack cleanup code (the inverse of what we did on function entry). Otherwise, we carry on to lines 12-14 of _function, which copy the return value into place and then jump (jmp is goto in assembly language) to the same _return label to perform the aforementioned boilerplate.

Under UNIX calling conventions the exit code of a process is whatever main returns, which in this case is what our recursive function returned to it. Bash stores the exit code of the last command in the variable $?, thus allowing us to quickly test our code:

$ cc -o foo function.s
$ ./foo ; echo $?
6

Yep, six is greater than five.

A few words on optimization

A second version, below, implements the same function using Tail Call Optimization, a clever performance hack where the compiler figures out that a function is isomorphic to a loop and rewrites it as one. Rather than incurring the overhead of pushing and popping the stack every time we want to increment our counter (and thus the danger that we might run out of stack space), we use what amounts to a while loop implemented on the %eax register. (Note to former Apple II/Commodore 64 owners: %eax is the "extended accumulator", a 32-bit cousin of the 6502's accumulator register).

_function:
  pushq	%rbp
	movq	%rsp, %rbp
	movl	%edi, -4(%rbp)
	movl	-4(%rbp), %eax
_while:
	addl	$1, %eax
	cmpl	$5, %eax
	jle	_while
	popq	%rbp
	ret

	.globl	_main
_main:
	pushq	%rbp
	movq	%rsp, %rbp
	subq	$16, %rsp
	movl	$0, %eax
	movl	%eax, %edi
	callq	_function
	addq	$16, %rsp
	popq	%rbp
	ret

The third and final version combines TCO with inlining, which is a technique where the compiler (in this case, me) determines that the function can be included literally at the point at which it's called (the "call site"). Here we have no stack overhead at all, just the same single-register while loop embedded in the _main function.

  .globl	_main
_main:
	pushq	%rbp
	movq	%rsp, %rbp
	movl	$0, %eax
_while:
	addl	$1, %eax
	cmpl	$5, %eax
	jle	_while
	popq	%rbp
	ret

To do much better than this, one would need semantic analysis to determine whether the function is used and if that use changes anything in the output of the program.

This entry is part of my journal, published September 2, 2013, in Berlin.