Possibly Pythonic Tail Call Optimization (TCO/TRE)

Terry Reedy tjreedy at udel.edu
Tue Jul 14 00:46:37 CEST 2015

On 7/13/2015 7:22 AM, Jussi Piitulainen wrote:
> Ian Burnette writes:

A post I did not receive, but want to comment on.

>> I've recently been playing around with Clojure, and I really like the
>> way they've overcome the JVM not supporting TRE. The way Clojure
>> solves this problem is to have a built in "recur" function that
>> signals to the Clojure compiler to convert the code to a loop in the
>> JVM.In python we could do something similar

Currently, the only means of affecting compilation of Python code are 
future imports and global and nonlocal declarations. (Coding cookie 
comments affect decoding of python code encoded as bytes, which happens 
before interpretation of the code.)  Functions are called while running, 
after normal compilation.  Python-coded functions are created during 
runtime by executing def statements (although code object are created 
during compilation).

A tre(code) function could be written today that converts a 
tail-recursive body to a while body. It would be used as an alternate 
'decorator' that works on the function text rather than the function 
object.  Example:

def dsum(n, a=0): return dsum(n-1, a+n) if n > 0 else a

In pseudocode (minus error checks and raises) that would work for the 
simple (and normal) case of exactly one tail_recursive call:

def tre(tail_rec_code):
   split tail_rec_code into initial_part (header, docstring,
     and initial comments and code before the alternation)
     and recursive_alternation
   if recursive_alternation is if-else expression,
     convert to if-else statement
   if tail-recursion is in else branch,
     invert if condition and switch branch bodies
   replace 'if' with 'while'
   replace tail call with multiple assignment to parameters
     (from signature)
   modified_code = inital_part + modified_body
   return exec(modified_code, globals())

In the replacement steps
     if n > 1: return fac(n-1, acc*n)
would become
     while n > 1: n, acc = n-1, acc*n

If the operations were done with strings, tre would work today and 
indefinitely with any interpreter with standard string operations, exec 
and globals (though in practice I would use re also).  If operations 
were done with ASTs, compile and the ast module (which is subject to 
change) would be required.

 >> by having a recur(*args,
>> **kwargs) function that removes its parent from the stack and then
>> runs the parent function again with the given arguments.

Ian here somewhat confusingly half switches from loop conversion, which 
saves both space and time, to general tail call elimination, which saves 
space possibly at the cost of more time and less useful or even useless 
tracebacks.  Which is to say, he switches from a tre-only implementation 
to a general tco implementation but uses syntax limited to tre. As Jussi 
notes, a general method should not be so limited.

If one only wants tre, for which traceback compression is often a plus 
rather than a minus, then for automatic conversion, I think the time and 
space efficient loop-conversion should be used.

If the recursion iterates through a collection, which is nearly always 
the case, then a human might convert to a for loop instead of a while 
loop.  The idiomatic example below contains the input check usually left 
out of toy recursion examples.  (Doing the check just once would require 
nesting the recursion within an outer function.)

def fact(n):
     if n < 0 or not isinstance(n, int):
         raise ValueError('non-negative integer required')
     val = 1
     for i in range(2, n+1):
         val *= i
     return val

For loops separate iterating through a collection, which in this case is 
a range of integers, which has a known builtin solution, from processing 
the items produced by the iteration. Recursion and the equivalent while 
loops mix item processing with explicit next-item calculation, which is 
generally collection-class specific.  To write a generic function like 
sum(iterable, start) with recursion, one must explicitly do everything 
that for loop do, which is to call iter, next, and handle StopIteration.

Stack frames, and hence tre and tco, are implementation issues, not 
language issues.  Moreover, these are machine implementation issues, as 
an intelligent human executing Python code should do 'the right thing' 

Guido himself has noted that it would be trivial for CPython to optimize 
all tail calls.  But he also noted the cost of sometimes useless 
trackbacks and distraction from using iterables with for loops.

Optimizing specific tail calls is tricker.  For one thing, calls to a 
recursion-replacement function, such as recur, add a stack frame that 
must also be popped. A CPython bytecode manimpuation solution was given 
years ago as a recipe on the ActiveState Python Cookbook site.  MacroPy 
at pypi.python.org "provides a mechanism for user-defined functions 
(macros) to perform transformations on the abstract syntax tree(AST) of 
Python code at module import time".  Among many included examples, it 
has a tco decorator.

Terry Jan Reedy

More information about the Python-list mailing list