Friday, November 11, 2022

Curly Braces #6: Recursion and tail-call optimization


Over the years, I’ve come to appreciate the power of recursion, yet I have two minor gripes with it.

First, there is one part of every recursive call that jumps out at me as inelegant: the base case, also known as the escape clause.

Let’s look at the classic factorial example in Java.

int factorial(int n) {
        if (n <= 1) // Base case
            return 1; 
        return n * factorial(n - 1); // Recursive case
    }

Here’s the point: factorial is recursive except when it reaches the base case. Yes, I’m nitpicking—and mostly joking.

My second gripe is a more serious one: In practice, I rarely witness the use of recursion. The recursion use cases that stand out to me include XML parsing as well as code that handles hierarchical data in general. For instance, I once came across a neat piece of code that walked a TreeMap using a two-method recursive implementation. The fact that it involved two methods that repeatedly called each other blew developers’ minds. (The code also wasn’t commented.)

This implementation was beautiful. I still see, in my head, these two methods of equal size accomplishing an astonishing amount of work with so little code. This was true recursive power.

Of course, there are other valid uses of recursion in Java, such as in searching, sorting (see the Towers of Hanoi problem), and machine learning (due to the hierarchical structure of applicable datasets). But the truth is, due to stack concerns and the perceived complexity of recursion, Java developers generally write loop-centered algorithms instead.

Introducing the tail call


If the use of recursion is not common in the Java community, why write about it? Because Java is more than a language. It’s a platform based on a virtual machine, and the JVM supports other languages beyond Java. Some of those languages, such as Groovy, rely heavily on recursion.

Many functional language interpreters contain an optimization for recursion to save stack space based on tail calls. A tail call is when a method or function call is the last instruction inside of another method or function. (For simplicity, I’ll refer to all calls as function calls.)

Consider the function foo(), as defined as follows:

int foo(int a) {
  a = a + 1;
  return func(a);
}

The call to function func() is the last statement in function foo(); therefore, it’s a tail call. If foo() executed any instructions after the return call to func(), then func() would no longer be referred to as a tail call. Here’s an example where the recursive function is also a tail call.

int fact(int i, int acc) {
      if ( i == 0 ) 
        return acc;
      else
        return fact( i – 1, acc * i);
    }
    
    int factorial(int n) {
      if ( n == 0 )
        return 1;
      else
        return fact( n – 1, 1 );
    }

If you compile this code with a C compiler and debug it, you can examine the assembly code as it executes (see Figure 1). The red arrow points to the last two instructions for the function fact(), which are the call to fact() itself and retq for the return statement. This fits the definition of a tail call. (I generated this view using NetBeans; while I was debugging, I chose the Disassembly submenu option under Window > Debugging.)

Oracle Java, Java Career, Java Prep, Java Certification, Oracle Java Guides, Java Jobs, Java Prep, Java Preparation

Figure 1. Function fact() is a recursive tail call, as shown by the assembly code.

I chose to illustrate this example in C because it was easier to show the associated assembly code, but the same holds true for Java. In fact, you can copy this code into a Java class verbatim, and it will compile and work the same way.

Now, it’s time to optimize these functions, and there can be a significant benefit.

Tail-call optimization


Tail-call optimization (TCO) is very relevant for recursive calls, and the topic has become important in functional programming. You see, with any tail call—not just a recursive one—the function call itself can be optimized away and turned into what is effectively a goto. The benefit is that the optimization eliminates all the work required to set up the stack before the function call (the prolog) and then restore the stack afterwards (the epilog).

Consider the following code:

int func_a(int data) {
  data = do_this(data);
  return do_that(data);
}

The function do_that() is a tail call, and its nonoptimized assembly code might look something like the following:

... ! executing inside func_a()
push EIP ! push current instruction pointer on stack
push data ! push variable 'data' on the stack
jmp do_this ! call do_this() by jumping to its address
... ! executing inside do_this()
push EIP ! push current instruction pointer on stack
push data ! push variable 'data' on the stack
jmp do_that ! call do_that() by jumping to its address
... ! executing inside do_that()
pop data ! prepare to return value of 'data'
pop EIP ! return to do_this()
pop data ! prepare to return value of 'data'
pop EIP ! return to func_a()
pop data ! prepare to return value of 'data'
pop EIP ! return to func_a() caller
...

Notice the multiple pop instructions for both data and EIP (extended instruction pointer) register occurring in succession. The EIP register returns the value of data and restores the instruction pointer. One set of these epilogs and its associated prolog (where data and the EIP register are saved on the stack) could be eliminated and replaced by a simple jump (JMP) assembly instruction. This optimization can be done because the function do_that() will execute the epilog code for the calling function, func_a().

This optimization is safe because func_a() doesn’t perform any meaningful instructions after do_that() returns, because it’s a tail call. (Recall that if something were done with the return value after the call to do_that() returns, this optimization cannot be performed because do_that() would no longer be a tail call.)

Here’s the result of TCO; the crossed-out lines indicate instructions that were removed.

Oracle Java, Java Career, Java Prep, Java Certification, Oracle Java Guides, Java Jobs, Java Prep, Java Preparation

Shown slightly differently, this is the code before TCO.

Oracle Java, Java Career, Java Prep, Java Certification, Oracle Java Guides, Java Jobs, Java Prep, Java Preparation

Here it is again after TCO.

Oracle Java, Java Career, Java Prep, Java Certification, Oracle Java Guides, Java Jobs, Java Prep, Java Preparation

Comparing the shaded row in both, you can see how much less machine code there is to run in the optimized version.

Reducing the amount of code is good, but it’s not the biggest reason for preferring the optimized version. The real benefits appear when you combine TCO with recursion.

If you review the recursive factorial example shown earlier, you can see that when it is combined with this optimization, all the prolog and epilog assembly code that would be needed repeatedly can be removed. Such optimization effectively turns the following recursive pseudocode, which I’ve taken from a Wikipedia example

call factorial(3)
  call fact(3,1)
    call fact(2,3)
      call fact(1,6)
        call fact(0,6)
        return 6
      return 6
    return 6
  return 6
return 6

into this iterative pseudocode

call factorial(3)
  call fact(3,1)
  update variables with (2,3)
  update variables with (1,6)
  update variables with (0,6)
  return 6
return 6

In other words, TCO replaces the recursive code with a loop, which goes back to the observation that many Java developers would prefer to use loops instead of recursion. This optimization reduces the function call prolog and epilog assembly code for numerous recursive calls, and it also greatly reduces pressure on the stack.

Without this optimization, calculating the factorial of very large numbers (or doing other large recursive operations) would likely blow the stack. Reducing the algorithms to a loop in the compiled code removes that risk. This is why most functional language interpreters perform TCO behind the scenes.

Java and TCO


I spoke to Brian Goetz, the Java language architect at Oracle, about TCO back in 2014. At the time, he told me that it’s on a list of things to be added to the JVM. I caught up with him recently to discuss the topic, and he said that while progress has been made, and more may occur in the future, TCO hasn’t been fully implemented due to some security concerns.

Mark Reinhold, chief architect of the Java Platform Group at Oracle, recently was quoted saying the following:

It’s not clear to me that we would ever want to expose tail calls and continuations in the platform for general use but there may be ways we could use them inside the platform, in some fairly interesting ways.

It isn’t a big deal to most developers writing Java code that the JVM doesn’t fully perform TCO, because Java programmers tend to avoid recursion through the use of loops. For instance, here’s an iterative implementation of factorial in Java.

int factorial(int n) {
  int result = 1;
  for (int t=n; t > 1; t--)
      result *= t;
  return result;
}

However, the lack of TCO does become an issue when you run functional languages (such as Groovy) within the JVM. Then, recursive code that ran fine with that language’s native interpreter may blow the stack when it’s executed in the JVM.

It’s important to note that this isn’t a bug in the JVM. Still, for now, it’s best to do TCO yourself, if you can, by avoiding deeply recursive functions when you’re coding in a functional language on the JVM.

Source: oracle.com

Related Posts

0 comments:

Post a Comment