A Continuations Compromise in Python

Hi all It was suggested I post this idea to the python-ideas list. I apologize in advance for its length, I tried to be thorough :) Recently there's been some debate over the Tail-Call-Optimization and its use in Python. Some advocates coming from languages that support this optimization say that it makes recursive code far more efficient, and indeed allows some things to happen that could not otherwise occur without running out of stack space. The argument against, made most faithfully by the BDFL himself, is that looping behavior is in general easier to understand than recursion, and that implicitly optimizing code behind the scenes may confuse newcomers to the language. This would especially affect stack traces. Furthermore, Python is already an expressive language and there appear to be few use cases that truly demand TCO. Many of these arguments are strong, however, their strength can be used against them. Python is a teaching language, and proudly so, which is one of the reasons to keep it clean and simple. At the same time, recursion is an important part of Computer Science, software development and programming. Indeed, many people's first recursive functions nowadays are written in Python! Python is the perfect language to introduce these different kinds of constructs as the 'functional' paradigm is further absorbed by other mainstream languages in industry and academia. Other than it's potential to teach, Python's first class support of recursion, through an explicit mechanism to indicate 'continuation' style functions, is also undervalued in the use cases it more simply carries out compared to iterative implementations (never mind the performance benefit). The Actor model and message passing, state machine logic, and green thread schedulers are all potential uses. Similarly, many multi-threaded/multi-processor programs simplify their logic when passing continuations. This proposal is to extend the use of the keyword 'continue' as a way to explicitly invoke continuation style function calls, and through that mechanism, tail-call optimized recursion. The Zen of Python states that explicit is better than implicit, and this is the most major hang up with many proposed 'optimizations' that change system behavior. In many TCO proposals, the compiler is responsible for identifying recursive calls that can be optimized. In this proposal, continuations are achieved explicitly, and it is up to the programmer to ensure that they are proper tail calls, keeping all of the choice and power in the programmer's hands. This is best shown with a few examples. Currently, a 'continuation' style function might be written like this: def func(continuationFunc, *args): return continuationFunc(*args) The function is passed in as a first class object, then applied to some arguments and immidately returned. This is an important point about true tail-calls – their values must immediately be returned. The proposed syntax extension would change the above function to the following similar function: def func(continuationFunc, *args): continue continuationFunc(*args) In this case, instead of the 'return' keyword, the 'continue' keyword is used to indicate the programmer wishes to transfer full control to the continuationFunction. Tail-call purity would be enforced, meaning that non-tail calls def func(continuationFunc, *args): continue 1 + continuationFunc(*args) would either cause a syntax error or throw a runtime exception. This serves as a learning opportunity for those new to programming, just as many of Python's uniquely elegant implementations of programming constructs have, yet also provide a powerful and expressive way to solve certain kinds of recursive problems. The double use of the 'continue' keyword is no accident. The current use of the 'continue' keyword makes a prime candidate for extension for a few reasons. First there are theoretical reasons. For one, the use of the English word 'continue' in the above constructs is intuitive and would not take much explaining to a newcomer as to what it means versus 'return'. More importantly, since it has been pointed out that many recursive structures can be implemented as imperative loops, it seems appropriate that the same keyword would mean equivalent/parallel things in the two styles. Using continue inside a loop – bounce back up to the top of the loop and start with the next iteration, is synonymous with the proposed recursive use, to bounce back up the stack and immediately call the 'next' function. Indeed, trampoline style schedulers would use the continue keyword in its current form to implement continuations! These parallels between recursive and imperative loops help understanding the new proposed use. There are also some practical concerns. 'continue' as currently used is not that popular of a construct. When one needs to 'continue', one is glad it's there, but due to Python's expressive generator expressions, for-each style loops and inner functions, many use cases for 'continue' become more elegantly expressed in other ways. This means that, either way, a newcomers introduction to the uses of 'continue' aren't going to happen until he or she has a firm grasp of other parts of the language. Moreover, there's no rule saying that a student has to learn about all the uses of a keyword upon being first introduced to the keyword. A student would only inadvertently need to learn about continuation passing via 'continue' if they misunderstood the 'continue' statement while learning about loops, and somehow triggered the interpreter's wraith that they didn't define their continuation properly. There's also no hard and fast rule that a student would learn about complex loop control before they'd be interested in continuation passing, so it works both ways. 'continue' also currently has a very well defined, strict syntax. This is very important for any potential extensions, as it guarantees that this proposal is completely backwards compatible. The uses of the keyword in the examples are currently syntax errors, meaning they can't be in any current running code. The reuse of a keyword already in the language also guarantees there will be no naming collisions with older code. Using 'continue' in this way seems very 'Pythonic'. It flows naturally from the meaning of the keywords, and the extended syntax is still rather simple – it must be a single callable. There are no unique rules or corner cases. It also mimics other return-control flow additions to the language, most notably the 'yield' construct, used for generators and co-routines. Yield is overloaded, in a sense, to have two different yet similar purposes. Overloading 'continue' in this way seems to naturally flow from the evolution of the language. Likewise, users are still free to intermix yields, returns and continues in functions and still have a very well defined, easy to understand behavior. Some have voiced concerns. One instance is the appearance of a 'continue' in a try/catch/finally block. try: continue func(*args) catch: #error handle This appears to be an implementation problem, but only until you were to expand the same block into an equivalent, old-style 'return code' error handling scheme. err = continue func(*args) if(err): #error handle else: return err In this form, it becomes apparent that this is an illegal use of continue, as 'func' is not a tail-call! Checking the result of a tail-call and controlling logic via that doesn't make any sense, and as such, we can without remorse or regret say 'no tail calls in try/catch/finally' blocks. Tail-calls are functional calls where there is absolutely no more processing to be done in the current stack frame. 'Clean up' logic or error handling in catch and finally blocks indicates there is more processing to be done, and as such, there simply is no well defined 'tail-call' in these circumstances. They ought to remain normal stack returns. 'Continue' can certainly propagate exceptions as well as return values, but those would be caught at whichever stack level initiated the tail-call loop. Another concern is the fact that stack traces disappear with these types of constructs. The counter argument is that the construct that they replace (in some instances) are loops, which themselves only have limited 'stack' trace information. If you crash in a loop, you don't necessarily get the iteration of the loop you were on, or any information from previous iterations. That is all lost. Likewise, a naive implementation of the proposed 'continue' function would also lose stack trace information – however, it will do so in a way one can expect. Just as if one were to implement a catch block that swallowed every exception and reported nothing, one should be familiar with the behavior changes in the stack that 'continue' would produce. Thankfully, unlike some behind the scenes optimization, a 'continue' keyword clearly shows where tail-calls are made in this proposal. In a more advanced implementation, the clear use of 'continue' would allow the interpreter to annotate the stack trace, such that some information could easily be provided about current and historical program flow. There are some alternative designs for this problems that have been proposed. SJBrown has proposed a similar syntax, using the two keywords 'continue as' instead of simply 'continue'. This would further make clear to users unfamiliar with continuations that behavior is expected to be different here, and the English still flows elegantly. It does, however, reuse the 'as' keyword in an unfamiliar way. Other proposals followed the same logic, but introduced new keywords such as 'recurse' or 'tail'. These avoid any potential ambiguity with the 'continue' keyword, but are not backwards compatible. Furthermore, this proposal sees the actual English of the word 'continue' to be a strength in this case. Continuations and tail-calls are different, sometimes clearer, ways to implement advanced control flow. They allow the programmer a lot of power in a small expressive and elegant package. They are too frequently hidden behind the scenes, though, as the programmer relies on compiler identified 'optimization' spots. Introducing explicit continuations to Python would allow students learning programming and advanced users both an expressive, yet very easy to learn and familiar, construct. Extending the 'continue' keyword is the best candidate for this change, as its backwards compatible, parallels other similar extensions like yield, mimics the current use of the keyword, and is, in this author's opinion, 'Pythonic' ;)

First off, I think that you misrepresent Python by your characterization of it as a teaching language. To be sure, many CS programs _do_ teach Python as part of their curricula and use it to teach core topics. But Python is much more than that -- it is a serious language for serious applications. (If you're not sure about this, just ask Google, which has built large parts of its infrastructure on Python). More importantly, while I appreciate your idea from a theoretical perspective, I believe that your proposal would garner more attention if you would provide some real-world examples. If the expanded role for _continue_ were already implemented, what real problems would be easier to solve, or result in more elegant source code? I, for one, would like to see some "before" and "after" examples using the syntax on typical applications. On Sat, May 2, 2009 at 4:27 PM, John Graham <john.a.graham@gmail.com> wrote:
-- Gerald Britton

On Sat, May 02, 2009, Gerald Britton wrote:
I'm reasonaly sure John Graham had no intention of characterizing Python as "only" a teaching language; it surely is the case that Python *is* intended as a teaching language: http://www.python.org/doc/essays/cp4e.html -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith

Thanks Gerald. I did not mean to imply Python was not an industrial strength language at all. In fact, I wasn't introduced to it at school, but at work. I just know, like you pointed out, that Python's simple, elegant syntax is there for a reason. Namely, because it's easy to teach. One of the main criticisms of TCO, and continuations that might take advantage of it, was that it added too much complexity in its 'hidden' form, which would confuse newcomers to the language. This is an important criticism, which I hope using an explicit 'keyword' would ameliorate. Here's an example an anecdote (since the example code would take to much space...) that has been asked for. Both show the general pattern (which is now solved with trampoline functions) exists in multiple contexts, and can be simplified via the use of TCO, which can be made explicit and correct via the introduction of a new control keyword ('continue') to invoke it. This was a state machine implementation proposed by someone playing with the syntax def state1(self): do stuff on self if cond: continue self.state2() else: continue self.state1() def state2(self): ... The response from someone critical of the idea was to just use a trampoline and this style: def state1(self): do stuff on self if cond: return state2 else: return state1 For a second example, I can only offer an anecdote. I got interested in this style after coding up an 'active object' in C++, which is more or less all the stuff you have to 'bolt on' to a language without the Actor model to simulate actors. In python, you would get a little help from class decorators, but (and here's why I leave out the code because it'd just be too much) for the most part, I had to involve parts from threads/processes, locks/shared memory, message queues, and method wrappers. All just to get message passing. Basically speaking, the threads were used as a main loop - again, trampolining - across anything that came into the threadsafe message queue. With 'continue' implying a continuation + TCO, actors boil down to little more than class Actor: def method1(self, messageBack): continue messageBack() The pattern here, basically, that continue eliminates, is the constant referral to 'just use a trampoline function'. To me, language constructs exist to codify certain patterns, similar to the way list comprehensions captured a lot of what was previously done in for loops. Insofar as complexity is concerned, I feel like I'd have an easier time explaining to someone what a continuation was versus a normal return, rather than what a trampoline function was. All of the criticisms that have been levied against TCO can also be levied against trampolines, as they're going to just as likely not keep stack debug information. And while I've tried to stay away from the efficiency use case, that can't be ignored. Recursive data structures, mutually recursive functions and the messages I described above all would benefit from the efficiency of the optimized tail call. Guido implied one of the main criticisms of the tail call optimization was that it was done without the users knowledge, behind the scenes, by the compiler. I think many have always acknowledged that TCO was, in fact, an optimization, it was just the complexity that went with a 'hidden' optimization. Making it explicit using the 'continue' keyword acknowledges that criticism. So, I suppose if you wanted more examples of where 'continue' would be better than the current implementation, they are all the same examples of when the hidden TCO was applied in other languages. The use case is there, its just that in Python, we can make sure the User is aware - not to mention, we can enforce correctness. If a recursive call isn't tail recursive in another language that implements TCO, it just gets silently converted to a normal call which might blow the stack. Differentiating between continue and return in Python will enforce proper tail calls, not silently 'default' to something the user didn't want. On Sat, May 2, 2009 at 4:57 PM, Gerald Britton <gerald.britton@gmail.com> wrote:

I have a question about the implementation of "yield from." I recall hearing some talk about optimizing the stack in "yield from," but I didn't catch all of the details. I take it that there will be some steps taken to ensure that yield from's yield from their inner most yielder without having to touch base at all the objects in between whoever is asking for the value and whoever is giving it. That being the case, will this example implementation of a linked list still blow the stack for lists bigger than 1,000 items or not? class LinkedList: def __iter__(self): yield self.value if self.link: yield from self.link If it does still blow up the stack, then it's no big deal, but if this won't blow the stack up anymore, then it seems like there's very little difference between the kind of recursive programming invited by "yield from" and the kind of recursive programming invited by "continue object". If you hate reading TCO code (and I take this to be the #1 objection to adding TCO to Python, though there are also more technical reasons), you're still going to get it, only using "yield from" instead of "continue". So in that case, "continue" and "yield from" should be thought of as a pair of stack optimizers, one for functions and one for generators. -- Carl

On 3 May 2009, at 07:44, Carl Johnson wrote:
I haven't kept up with recent threads about yield-from but in its early stages at least the patch did still 'touch base' all objects in between, but at C-speed rather than at Python-speed. However I'm pretty sure it would be possible to short-ciruit this, but a trace of the stack of call still needs to be kept if the yield-from is followed by more processing in the generator function (IOW, if it's not a tail yield-from).
So 'continue f(x)' could be spelt 'return from f(x)'? But it's not the same anyway, because yield-from can be a tail-call or not, so the parallel would be more something like this: functions: return f(x) # normal call continue f(x) # optimized tail call generators: yield from gen # normal yield-from continue from gen # optimized tail yield-from In fact, 'continue from gen' would make sense even if it is not a tail yield-from. It would yield control to gen and never come back. OK I have to stop writing whatever comes to my mind now :) -- Arnaud -- Arnaud

Carl Johnson wrote:
Not in my current implementation. It still makes a nested sequence of calls to get to the innermost iterator -- it's just that, at least in the case of generators, they're C calls rather than Python ones, so they're much faster. You can still blow the stack, though. -- Greg

On Sat, May 2, 2009 at 5:55 PM, John Graham <john.a.graham@gmail.com> wrote:
No, it's not about codifying. It's about having a *significantly better* solution by modifying the language than working within the language. List comprehensions are significantly better than a full for-loop. Adding a keyword is not significantly better than returning your next function; it's actually worse. It's a non-solution to a non-problem. If you actually *had* a problem you could do it with trampolines. They do exactly what's needed, they just don't put a bow on it. Ho hum. -- Adam Olsen, aka Rhamphoryncus

On Sun, May 3, 2009 at 2:56 PM, John Graham <john.a.graham@gmail.com> wrote:
It's fairly subjective, but lists vs genexps make a good example: data = [] for i in range(50): data.append[i**2] use_result(data) vs use_result(i**2 for i in range(50)) Of course a listcomp would be a more direct comparison, but I think the argument is stronger because it's *not* a direct translation. You'd have to write a generator function, which is enough clumsier that nobody would even consider it, before genexps were added. In contrast, we have the before with trampolines: def a(): return (b, arg1, arg2) and the after: def a(): continue b(arg1, arg2) Sure, it's prettier to look like a function call, but you're disguising the fact that you're not calling it as a function right now. You're also disguising that you're returning from this function. If explaining the trampoline to others is the issue, why not call it a "pop and call" operation? Pop off the old stack frame, then do your call. Or call it "return and call", if they don't understand stack frames. -- Adam Olsen, aka Rhamphoryncus

On Sat, May 2, 2009 at 2:27 PM, John Graham <john.a.graham@gmail.com> wrote:
Premature optimization, irrelevant.
and indeed allows some things to happen that could not otherwise occur without running out of stack space.
Use a trampoline. It's the same thing you propose, only it uses existing syntax. It's got a few rough edges, but they're quite manageable, and in no way justify the vast bikeshed this issue has garnered. -- Adam Olsen, aka Rhamphoryncus

On Sun, 3 May 2009 09:22:17 am Adam Olsen wrote:
It's hardly "premature" to notice that recursive code in Python is significantly slower and less efficient than in other languages. This is a known problem, or at least issue, since some people consider that the solution (tail-recursion optimization) is worse that the problem.
Up until a month or so ago, I'd never heard of the term "trampoline" (apart from the thing you jump up and down on), and I still don't know what it means. Checking Wikipedia, I see that in computing, trampoline has *eleven* definitions. Which one do you mean?
For the record, I find the OP's idea dubious. I don't like having yet another way of spelling "exit this function and return this result": return, yield, and now (proposed) continue. So a *very* tentative -0 on the suggestion. Maybe +0. Just tossing a wild idea out there... is it conceivable to build a decorator that optimizes a tail-call recursive function? Then it becomes a matter of explicit programmer choice whether or not to do so, with no new syntax. You would have the choice of: * write a function as tail-recursion because it is most expressive and simple algorithm, and live with the inefficiency (which as Adam points out, may not be a problem in practice); * manually re-write it as a for-loop, exchanging simplicity for runtime efficiency; or * decorate the recursive function, giving up some debugging information for efficiency, but keeping the simplicity of the tail-recursive form. I don't know whether such a thing is even possible, let alone how to go about doing such a thing, but if it did exist, I'd use it. -- Steven D'Aprano

Steven D'Aprano wrote:
It's also by no means certain that TCO would provide the kind of speed benefit that people imagine. A lot of the overhead of making a Python function call is concerned with packing up the arguments and unpacking them again, which you still need to do even if you're reusing the stack frame. -- Greg

Greg Ewing writes:
I thought that was a good part of the appeal of TCO, though, that the compiler can often arrange to do data manipulations in such a way that the stack frame (or even data-in-register) is just there, ready to go when control is transferred. Ie, the packing/unpacking that is purely related to function calling is avoided. Is this a difficulty in implementing for Python, or did I misunderstand the concept? Ie. that's my fuzzy recollection of a head-splitting conversation with a Schemer that started, "what does call-with-current-continuation *do*?" and part I of his lecture finished with "well, why don't we start with something a little easier like tail call optimization?" Incomprehension-is-my-middle-name-ly y'rs,

2009/5/3 Stephen J. Turnbull <stephen@xemacs.org>:
This can be done with recursive tail calls, but I guess that in Python, in general, the compiler is not able to know whether a tail call is actually calling the function currently being executed, as the compiler only knows it calls a function with a given name (which is only bound to an object at runtime). However, I think that the interpreter would be able to decide this at run time - and even to short-circuit the packing/unpacking sequence of parameters. -- Arnaud

Le Sun, 3 May 2009 18:42:20 +0100, Arnaud Delobelle <arnodel@googlemail.com> s'exprima ainsi:
The explicit keyword solution proposed is a big help, in this case. Then the interpreter can (try and) proceed to optimization as required -- and possibly fail and protest against wrong requirement. Instead of collecting all needed information and running complicated algorithms _just to know_ whether optimization is simply possible -- before any process. Denis ------ la vita e estrany

Stephen J. Turnbull wrote:
Possibly that could be done in the case of self-recursion, but it would be trickier when the function being tail-called is a different one. In that case you also have the problem that the called function can have a different stack size, number of local variables, list of constants, etc. You can't just re-use the same stack frame for a different function without making various adjustments. -- Greg

Steven D'Aprano <steve@...> writes:
Is there some evidence that this "known issue" has been popping up in real-world production Python code (I am not talking about Lisp, C, or any other language), rather than academic discussions? Premature optimization is trying to optimize something which is not a significant contributor to execution time. Regards Antoine.

I think its about more than optimization. It's also about being able to code a solution using recursion without worrying about the stack. The execution time may or may not change but the Python source may be easier to read using recursion instead of iteration. At the same time, I feel that some of the hoops one has to jump through (like passing state in function arguments) can adversely affect the readability and elegance of a recursive solution that is trying to take advantage of TCO. In general, I don't like coding source a certain way because "I know" that the compiler will do something "special" if I do. What is "special" today may be not so special (or even worse) tomorrow. I suppose I would rather do the work of translating my recursive solution into an iterative one rather than trying to code my recursive function just right so that the compiler will do TCO for me, if the stack is a concern. With this approach, I can wind up with faster execution and lower amortized memory costs as well. On Sun, May 3, 2009 at 8:30 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
-- Gerald Britton

Gerald Britton wrote:
Stack space is only an issue if you're using recursion to traverse a linear data structure. If the structure is tree-shaped, you'll run out of memory for the tree long before it's deep enough to cause stack problems. I've yet to see a Python function operating on a linear structure that was easier to follow using recursion instead of iteration. -- Greg

On Sun, 3 May 2009 10:30:55 pm Antoine Pitrou wrote:
People spend time trying to speed up general purpose code in the standard library, to save 1% or 2% on benchmarks. I'm sure I don't need to google for examples -- you've been around the Python-Dev list long enough to see plenty of examples, and I for one haven't seen anyone objecting to improving Python's general execution speed. In this case, the speed-up could (in principle) be by up to a factor of five or so, not a mere couple of percent. Recursion in Python is quite slow compared to iteration. Here's a real (albeit trivial) example: the Towers of Hanoi. def move(n, a, b, c): # Version using tail recursion. "Move n disks from needle a to b using c as temporary storage." if n > 0: for t in move(n-1, a, c, b): yield t yield (a, b) for t in move(n-1, c, b, a): yield t def move2(n, a, b, c): # Non-tail recursive version. while n > 0: for t in move2(n-1, a, c, b): yield t yield (a, b) n -= 1 a, c = c, a And in use:
Okay, this toy program isn't exactly a mission-critical application, but it does demonstrate a genuine performance penalty when using recursion. In this case, two recursive calls (one of which is a tail-call) takes nearly 60% more time than a single recursive call in a while loop. But more importantly, execution time is not the only resource that needs to be optimized. Programmer effort is an important resource that many people wish to optimize. Removing tail-call recursion is a simple algorithm quite well suited to be done by the machine, but tedious and sometimes tricky for the average programmer to do correctly. Another is stack space -- hence the relatively low default recursion limit. Anyone who has ever seen "maximum recursion depth exceeded" in real-world code has a real problem which (if the function is tail-recursive) could be fixed by a hypothetical optimizer. -- Steven D'Aprano

Steven D'Aprano <steve@...> writes:
People spend time trying to speed up general purpose code in the standard library, to save 1% or 2% on benchmarks.
Actually, it's quite rare that these patches (those which only yield a 1 or 2% improvement) are accepted, except when they don't make the implementation more complex. Lots of other patches with more significant speedups on micro-benchmarks are refused, too. You can find some of them in the bug tracker.
and I for one haven't seen anyone objecting to improving Python's general execution speed.
Not in the absolute, sure. If someone produces a patch speeding up recursive calls it will be considered with the same criteria as any patch claiming to improve performance (see above). This doesn't mean it will be accepted for sure.
In this case, two recursive calls (one of which is a tail-call) takes nearly 60% more time than a single recursive call in a while loop.
Why do you think recursion has anything to do with it, rather than simply the fact that there are twice more function calls? Besides, I object to the claim that solving the Towers of Hanoï problem is a real-world example ;) That said, it's true that recursive calls are probably costlier than non-recursive ones, due to the fact that only one frame object is cached for each code objects. But removing this limitation shouldn't make the common (non-recursive) case slower, and it shouldn't increase memory consumption too much, so it's not as easy as it seems. Regards Antoine.

On Mon, 4 May 2009 01:16:10 am Antoine Pitrou wrote:
But surely that's the point of removing tail-recursion? Each function call has overhead, and by reducing the number of function calls, you reduce the amount of overhead. Perhaps we're not discussing the same thing? After reading your reply, there does seem to be some confusion (at least in my head) as to what precisely we're talking about. I've seen tail-call optimization described as programatically converting a tail-call recursive function into an equivalent iterative function. I've also seen it referring to a process of minimizing the depth of the function call stack while still making the same number of function calls.
Besides, I object to the claim that solving the Towers of Hanoï problem is a real-world example ;)
Well, I did admit it was a toy :)
I don't think anyone really believes it is easy. Even the people on various blog sites saying it's "easy" are actually saying that a solution which is either fragile, or slow, or both, is easy, and that shouldn't surprise anyone. -- Steven D'Aprano

Hello again, Steven D'Aprano <steve@...> writes:
Well, it's not that obvious. Tail call optimization does not totally remove function calls, it replaces them with a slightly lighter kind of control transfer. It would not save the cost of setting up a new frame object (the function you transfer control to probably has differing constants and local variables), of passing parameters around (Python's rich function call conventions cost some CPU time) and various other bookkeeping. So, TCO would make the cost of the tail call slightly lighter (how much exactly is unknown), but it won't *suppress* it like e.g. inlining would do.
The former sentence is the formulation in theoretical (algorithmic) terms. The latter sentence is how I believe it gets implemented in practice. Basically, at the assembler level, you replace a "CALL + RETURN" sequence with a single "JUMP" instruction. But that JUMP instruction still must do all the work of setting up parameters, initializing a new frame etc. It's not anything like a lightweight intra-function JUMP.
Well, in the sentence you were responding to I was talking about optimizing recursive calls in Python *without* introducing TCO. ;) Regards Antoine.

On Mon, 2009-05-04 at 00:34 +1000, Steven D'Aprano wrote:
This doesn't demonstrate where the issue is. Is it function calls? The tail recursion call makes (eyeball count) twice as many python function calls (not calls into generators) per call to move(2)? Or perhaps something else? Sure, the TCO one is slower, but is the handing of result from generator to generator really the issue? If it is rather something else, it may be that the nested yielding while costing, is not costing disproportionately - and is still a fraction of the cost. python -m timeit 'from foo import do_move, do_move2' 'do_move()' 65535 10 loops, best of 3: 164 msec per loop python -m timeit 'from foo import do_move, do_move2' 'do_move2()' 32768 10 loops, best of 3: 108 msec per loop Half as many function calls, 2/3rds the time. *Much* better measurement than these sketches is needed to say where the issue is. -Rob

On Mon, 4 May 2009 12:34:53 am Steven D'Aprano wrote about removing tail-recursion:
In this case, the speed-up could (in principle) be by up to a factor of five or so, not a mere couple of percent.
Sorry, that "factor of five" is probably bogus. I got that for some comparisons between recursion and iteration, but the speed difference is probably not relevant to the sort of tail call optimizations we're discussing. I will *not* defend my suggestion of 5x faster, however, on the basis of my Towers of Hanoi test, I think 2x faster is conceivable. I found Guido's recent blog post of tail call optimization: http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html Worthwhile reading. Also read the commenters: they make some interesting points, such as that tail call optimization is a general technique applicable to more than just recursion. Guido's largest objection to TCO is that it ruins nice stack traces when you get an exception. I must admit I've never understood this argument. Perhaps I'm missing something, but I've never considered the stack trace you get in recursive functions useful. Here's an example:
To me, all those identical "line 3, in spam" lines are just noise. They get in the way of a nice stack trace! What is Guido seeing that I'm not? Hopefully he isn't counting them by hand to see how deep he got into the recursion! I wish there was a way to tell Python to just throw that white noise away and give me the sort of stack trace I get from a loop function. (It's not so bad when there only ten lines, but when there's 1000, you might very well fill your xterm's buffer and lose valuable history.)
Which of course illustrates the point that Guido's recommendation to re-write the recursion as an iterative loop by hand will have the same effect on the stack trace as iteration already does. I haven't heard anyone argue that stack traces in a while or for loop should show you the entire history of the loop, so I wonder why recursive calls should be treated as sacrosanct? (I have nothing to say about Guido's other arguments against TCO at this point, but my silence should not be interpreted as agreement.) -- Steven D'Aprano

Your comment on the iterative solution also erasing the stack trace is a good one. Insofar as the specific problem, I've always thought maybe ... notation could capture recursive stack traces easier... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in spam ... File "<stdin>", line 2, in spam ValueError: Nobody expects the Spanish Inquisition! although I believe that particular syntax is used in Doctest (although would it be incorrect in this case?). The same could be used in a 'continue' statement, indicating the first one or two functions and the last one or two functions, similar to sequences described in math (1,2,...,n-1,n). Anywho, I'm no expert on stack traces so if someone else had a better idea I'm all ears. I'd also like to agree with your statement that TCO is not just about execution speed. In fact, to clear up this whole argument, lets go ahead and claim that TCO code will never be faster than the iterative version of things. TCO will prevent things from blowing the stack. And despite many of the cases presented, there are some things that CAN'T be done iteratively, simply, as the commenters to the post you linked to pointed out. Mutual recursion is one, as the iterative solution to that begins to grow in size and use some more complex looping logic. Actual continuation style, where you're making tail-calls to OTHER functions, can only be represented by a trampoline. Which, as I've stated before, is hardly a simpler solution. The complexity of the iterative solution simply goes up from there. I want to clarify that the original suggestion was never to implement TCO 'implicitly'. I don't want to have to 'write my function just right' to get TCO to work. This is why the keyword was suggested, as it'd be an explicit way to tell the interpreter 'this is a tail-call'. If it's not, then throw an exception. Otherwise, there's no optimizations going on behind the scenes that I'm not aware of, which is the case in languages that just turn tail calls optimized behind the scenes. I agree completely with Guido that this can confuse newcomers. So to clarify (and don't get me wrong, this is a very interesting conversation :), the proposal is to add explicit and unambiguous support for the elimination of tail calls (via the use of a keyword), rather than an implicit, confusing optimization behind the scenes, and to do so not to increase runtime performance (per say) but instead to allow techniques which currently blow the stack to work. On Sun, May 3, 2009 at 10:29 AM, Steven D'Aprano <steve@pearwood.info> wrote:

Back to the bike shed argument, the use of 'continue' is incidental. I personally like it, but I totally can see the arguments brought against it. My main thrust is for explicit TCO in Python, to allow for different models of computation in an explicit, Pythonic way. I believe others have suggested making 'yield from' require tail calls, or introducing 'yield as'. Both have their ups and downs. I'd suggest we debate whether or not the original idea, a TCO keyword in python, separate from what that keyword ought to be. It's no use talking about keywords if the original idea is bunk, and I believe the second argument can be won far more pragmatically than the first, which requires some careful thought on all of our parts. On Sun, May 3, 2009 at 11:06 AM, Steven D'Aprano <steve@pearwood.info> wrote:

2009/5/3 Steven D'Aprano <steve@pearwood.info>:
Guido's largest objection to TCO is that it ruins nice stack traces when you get an exception. I must admit I've never understood this argument.
Tail call optimization is more general than tail recursion optimization and it can indeed eliminates useful context from the call stack if the caller and callee are different functions. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

Arnaud Delobelle wrote:
Right, but if you have more than one tail call, you lose the history of _which_ recursive call was taken. If you are saying, "Only a single tail-recursive call is allowed per function," then you are eliminating solution by cases. Think: def recurse(source): if branching(source): if goleft(source): return recurse(source[0]) else: return recurse(source[1:]) return source --Scott David Daniels Scott.Daniels@Acm.Org

On 5/3/09, Antoine Pitrou <solipsis@pitrou.net> wrote:
Steven D'Aprano <steve@...> writes:
Is there some evidence that this "known issue" has been popping up in real-world production Python code
(1) The overhead of function calls (which are more common in a recursive style) is a real problem. That said, it isn't clear that Tail Call Optimization would make enough of a dent on the cost of calling a function. (2) The direct efficiency losses (particularly in space) show up every time an exception is thrown for excessive recursion depth. You could argue that this isn't common in production code, but that is because people learn to distort their code to avoid that limit. The counter-argument is that sometimes excessive depth is caused by a bug, and finding out sooner is a good thing. Guido (and Greg, earlier in this thread) have said that hitting the recursion limit is *usually* a bug. If that were true (and it might be), then tail call optimization *in python* would usually just mean "When I write an infinite loop, take out the breakpoints and debugging information." -jJ

On Sun, May 3, 2009 at 3:02 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Just tossing a wild idea out there... is it conceivable to build a decorator that optimizes a tail-call recursive function?
Yes, check http://code.activestate.com/recipes/496691/. The caveat is that for "small" values it actually pessimizes rather than optimizes the function ("Note also that these decorators are not optimizing and for small argument values they are actually far slower.") George

On Sun, May 3, 2009 at 1:02 AM, Steven D'Aprano <steve@pearwood.info> wrote:
If you need a tail recursive algorithm, you need it to have O(1) space consumption. Constant overhead from each call is minor in comparison. We wouldn't even have this discussion if constant overhead was given by itself, but we would have it if O(1) space consumption was given by itself.
* Used in some LISP implementations, a trampoline is a loop that iteratively invokes thunk-returning functions. A single trampoline is sufficient to express all control transfers of a program; a program so expressed is trampolined or in "trampolined style"; converting a program to trampolined style is trampolining. Trampolined functions can be used to implement tail recursive function calls in stack-oriented languages.[1] For example, something like this (but obviously with arguments and the ability to finish looping): def a(): return b def b(): return a def trampoline(func): while True: func = func()
Yes, there's a lot of weird bytecode hacks out there.
One unfortunate point of confusion is that giving up debugging information is only an issue if you want to do it *by default*, applying it to the entire language. In contrast, any infinite loop should use O(1) space, and therefor cannot retain any debugging information. -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
On Sun, May 3, 2009 at 1:02 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I was pretty much in the same position also.
Aha. This is a restatement of the decades-old fact that a while loop and goto are sufficient to implement any flow chart.
For example, something like this (but obviously with arguments
Or a global state structure.
and the ability to finish looping):
Not sure what you mean, unless a way to exit the mainloop.
So the modern version is functions returning the next function to call instead of code-chunks setting the next label to jump to. Of course, that means that one can match the worst spaghetti-code mess with this mechanism also ;-). Thanks for a clear explanation. tjr

Adam Olsen wrote:
Every once in a while (usually with actor and message passing algorithms) I get the urge to have something like this, I didn't realize it had a formal name: def a(n): trampoline b(n-1) print "never gets printed" def b(n): if cond(n): return n else: trampoline a(n * 2) print "never gets printed" So a trampolined call bails out of the current call frame, and whatever the trampolined call returns, it returns. >>> a(5) == b(4) True >>> The same thing would happen with generators and try/except handlers: def a(n): try: yield n trampoline b(n-1) except Exception, err: print "a() error:", err def b(n): yield n raise RuntimeError, "snort" >>> for i in a(2): ... print i ... 2 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in b RuntimeError: snort >>> Joel

First off, I think that you misrepresent Python by your characterization of it as a teaching language. To be sure, many CS programs _do_ teach Python as part of their curricula and use it to teach core topics. But Python is much more than that -- it is a serious language for serious applications. (If you're not sure about this, just ask Google, which has built large parts of its infrastructure on Python). More importantly, while I appreciate your idea from a theoretical perspective, I believe that your proposal would garner more attention if you would provide some real-world examples. If the expanded role for _continue_ were already implemented, what real problems would be easier to solve, or result in more elegant source code? I, for one, would like to see some "before" and "after" examples using the syntax on typical applications. On Sat, May 2, 2009 at 4:27 PM, John Graham <john.a.graham@gmail.com> wrote:
-- Gerald Britton

On Sat, May 02, 2009, Gerald Britton wrote:
I'm reasonaly sure John Graham had no intention of characterizing Python as "only" a teaching language; it surely is the case that Python *is* intended as a teaching language: http://www.python.org/doc/essays/cp4e.html -- Aahz (aahz@pythoncraft.com) <*> http://www.pythoncraft.com/ "Typing is cheap. Thinking is expensive." --Roy Smith

Thanks Gerald. I did not mean to imply Python was not an industrial strength language at all. In fact, I wasn't introduced to it at school, but at work. I just know, like you pointed out, that Python's simple, elegant syntax is there for a reason. Namely, because it's easy to teach. One of the main criticisms of TCO, and continuations that might take advantage of it, was that it added too much complexity in its 'hidden' form, which would confuse newcomers to the language. This is an important criticism, which I hope using an explicit 'keyword' would ameliorate. Here's an example an anecdote (since the example code would take to much space...) that has been asked for. Both show the general pattern (which is now solved with trampoline functions) exists in multiple contexts, and can be simplified via the use of TCO, which can be made explicit and correct via the introduction of a new control keyword ('continue') to invoke it. This was a state machine implementation proposed by someone playing with the syntax def state1(self): do stuff on self if cond: continue self.state2() else: continue self.state1() def state2(self): ... The response from someone critical of the idea was to just use a trampoline and this style: def state1(self): do stuff on self if cond: return state2 else: return state1 For a second example, I can only offer an anecdote. I got interested in this style after coding up an 'active object' in C++, which is more or less all the stuff you have to 'bolt on' to a language without the Actor model to simulate actors. In python, you would get a little help from class decorators, but (and here's why I leave out the code because it'd just be too much) for the most part, I had to involve parts from threads/processes, locks/shared memory, message queues, and method wrappers. All just to get message passing. Basically speaking, the threads were used as a main loop - again, trampolining - across anything that came into the threadsafe message queue. With 'continue' implying a continuation + TCO, actors boil down to little more than class Actor: def method1(self, messageBack): continue messageBack() The pattern here, basically, that continue eliminates, is the constant referral to 'just use a trampoline function'. To me, language constructs exist to codify certain patterns, similar to the way list comprehensions captured a lot of what was previously done in for loops. Insofar as complexity is concerned, I feel like I'd have an easier time explaining to someone what a continuation was versus a normal return, rather than what a trampoline function was. All of the criticisms that have been levied against TCO can also be levied against trampolines, as they're going to just as likely not keep stack debug information. And while I've tried to stay away from the efficiency use case, that can't be ignored. Recursive data structures, mutually recursive functions and the messages I described above all would benefit from the efficiency of the optimized tail call. Guido implied one of the main criticisms of the tail call optimization was that it was done without the users knowledge, behind the scenes, by the compiler. I think many have always acknowledged that TCO was, in fact, an optimization, it was just the complexity that went with a 'hidden' optimization. Making it explicit using the 'continue' keyword acknowledges that criticism. So, I suppose if you wanted more examples of where 'continue' would be better than the current implementation, they are all the same examples of when the hidden TCO was applied in other languages. The use case is there, its just that in Python, we can make sure the User is aware - not to mention, we can enforce correctness. If a recursive call isn't tail recursive in another language that implements TCO, it just gets silently converted to a normal call which might blow the stack. Differentiating between continue and return in Python will enforce proper tail calls, not silently 'default' to something the user didn't want. On Sat, May 2, 2009 at 4:57 PM, Gerald Britton <gerald.britton@gmail.com> wrote:

I have a question about the implementation of "yield from." I recall hearing some talk about optimizing the stack in "yield from," but I didn't catch all of the details. I take it that there will be some steps taken to ensure that yield from's yield from their inner most yielder without having to touch base at all the objects in between whoever is asking for the value and whoever is giving it. That being the case, will this example implementation of a linked list still blow the stack for lists bigger than 1,000 items or not? class LinkedList: def __iter__(self): yield self.value if self.link: yield from self.link If it does still blow up the stack, then it's no big deal, but if this won't blow the stack up anymore, then it seems like there's very little difference between the kind of recursive programming invited by "yield from" and the kind of recursive programming invited by "continue object". If you hate reading TCO code (and I take this to be the #1 objection to adding TCO to Python, though there are also more technical reasons), you're still going to get it, only using "yield from" instead of "continue". So in that case, "continue" and "yield from" should be thought of as a pair of stack optimizers, one for functions and one for generators. -- Carl

On 3 May 2009, at 07:44, Carl Johnson wrote:
I haven't kept up with recent threads about yield-from but in its early stages at least the patch did still 'touch base' all objects in between, but at C-speed rather than at Python-speed. However I'm pretty sure it would be possible to short-ciruit this, but a trace of the stack of call still needs to be kept if the yield-from is followed by more processing in the generator function (IOW, if it's not a tail yield-from).
So 'continue f(x)' could be spelt 'return from f(x)'? But it's not the same anyway, because yield-from can be a tail-call or not, so the parallel would be more something like this: functions: return f(x) # normal call continue f(x) # optimized tail call generators: yield from gen # normal yield-from continue from gen # optimized tail yield-from In fact, 'continue from gen' would make sense even if it is not a tail yield-from. It would yield control to gen and never come back. OK I have to stop writing whatever comes to my mind now :) -- Arnaud -- Arnaud

Carl Johnson wrote:
Not in my current implementation. It still makes a nested sequence of calls to get to the innermost iterator -- it's just that, at least in the case of generators, they're C calls rather than Python ones, so they're much faster. You can still blow the stack, though. -- Greg

On Sat, May 2, 2009 at 5:55 PM, John Graham <john.a.graham@gmail.com> wrote:
No, it's not about codifying. It's about having a *significantly better* solution by modifying the language than working within the language. List comprehensions are significantly better than a full for-loop. Adding a keyword is not significantly better than returning your next function; it's actually worse. It's a non-solution to a non-problem. If you actually *had* a problem you could do it with trampolines. They do exactly what's needed, they just don't put a bow on it. Ho hum. -- Adam Olsen, aka Rhamphoryncus

On Sun, May 3, 2009 at 2:56 PM, John Graham <john.a.graham@gmail.com> wrote:
It's fairly subjective, but lists vs genexps make a good example: data = [] for i in range(50): data.append[i**2] use_result(data) vs use_result(i**2 for i in range(50)) Of course a listcomp would be a more direct comparison, but I think the argument is stronger because it's *not* a direct translation. You'd have to write a generator function, which is enough clumsier that nobody would even consider it, before genexps were added. In contrast, we have the before with trampolines: def a(): return (b, arg1, arg2) and the after: def a(): continue b(arg1, arg2) Sure, it's prettier to look like a function call, but you're disguising the fact that you're not calling it as a function right now. You're also disguising that you're returning from this function. If explaining the trampoline to others is the issue, why not call it a "pop and call" operation? Pop off the old stack frame, then do your call. Or call it "return and call", if they don't understand stack frames. -- Adam Olsen, aka Rhamphoryncus

On Sat, May 2, 2009 at 2:27 PM, John Graham <john.a.graham@gmail.com> wrote:
Premature optimization, irrelevant.
and indeed allows some things to happen that could not otherwise occur without running out of stack space.
Use a trampoline. It's the same thing you propose, only it uses existing syntax. It's got a few rough edges, but they're quite manageable, and in no way justify the vast bikeshed this issue has garnered. -- Adam Olsen, aka Rhamphoryncus

On Sun, 3 May 2009 09:22:17 am Adam Olsen wrote:
It's hardly "premature" to notice that recursive code in Python is significantly slower and less efficient than in other languages. This is a known problem, or at least issue, since some people consider that the solution (tail-recursion optimization) is worse that the problem.
Up until a month or so ago, I'd never heard of the term "trampoline" (apart from the thing you jump up and down on), and I still don't know what it means. Checking Wikipedia, I see that in computing, trampoline has *eleven* definitions. Which one do you mean?
For the record, I find the OP's idea dubious. I don't like having yet another way of spelling "exit this function and return this result": return, yield, and now (proposed) continue. So a *very* tentative -0 on the suggestion. Maybe +0. Just tossing a wild idea out there... is it conceivable to build a decorator that optimizes a tail-call recursive function? Then it becomes a matter of explicit programmer choice whether or not to do so, with no new syntax. You would have the choice of: * write a function as tail-recursion because it is most expressive and simple algorithm, and live with the inefficiency (which as Adam points out, may not be a problem in practice); * manually re-write it as a for-loop, exchanging simplicity for runtime efficiency; or * decorate the recursive function, giving up some debugging information for efficiency, but keeping the simplicity of the tail-recursive form. I don't know whether such a thing is even possible, let alone how to go about doing such a thing, but if it did exist, I'd use it. -- Steven D'Aprano

Steven D'Aprano wrote:
It's also by no means certain that TCO would provide the kind of speed benefit that people imagine. A lot of the overhead of making a Python function call is concerned with packing up the arguments and unpacking them again, which you still need to do even if you're reusing the stack frame. -- Greg

Greg Ewing writes:
I thought that was a good part of the appeal of TCO, though, that the compiler can often arrange to do data manipulations in such a way that the stack frame (or even data-in-register) is just there, ready to go when control is transferred. Ie, the packing/unpacking that is purely related to function calling is avoided. Is this a difficulty in implementing for Python, or did I misunderstand the concept? Ie. that's my fuzzy recollection of a head-splitting conversation with a Schemer that started, "what does call-with-current-continuation *do*?" and part I of his lecture finished with "well, why don't we start with something a little easier like tail call optimization?" Incomprehension-is-my-middle-name-ly y'rs,

2009/5/3 Stephen J. Turnbull <stephen@xemacs.org>:
This can be done with recursive tail calls, but I guess that in Python, in general, the compiler is not able to know whether a tail call is actually calling the function currently being executed, as the compiler only knows it calls a function with a given name (which is only bound to an object at runtime). However, I think that the interpreter would be able to decide this at run time - and even to short-circuit the packing/unpacking sequence of parameters. -- Arnaud

Le Sun, 3 May 2009 18:42:20 +0100, Arnaud Delobelle <arnodel@googlemail.com> s'exprima ainsi:
The explicit keyword solution proposed is a big help, in this case. Then the interpreter can (try and) proceed to optimization as required -- and possibly fail and protest against wrong requirement. Instead of collecting all needed information and running complicated algorithms _just to know_ whether optimization is simply possible -- before any process. Denis ------ la vita e estrany

Stephen J. Turnbull wrote:
Possibly that could be done in the case of self-recursion, but it would be trickier when the function being tail-called is a different one. In that case you also have the problem that the called function can have a different stack size, number of local variables, list of constants, etc. You can't just re-use the same stack frame for a different function without making various adjustments. -- Greg

Steven D'Aprano <steve@...> writes:
Is there some evidence that this "known issue" has been popping up in real-world production Python code (I am not talking about Lisp, C, or any other language), rather than academic discussions? Premature optimization is trying to optimize something which is not a significant contributor to execution time. Regards Antoine.

I think its about more than optimization. It's also about being able to code a solution using recursion without worrying about the stack. The execution time may or may not change but the Python source may be easier to read using recursion instead of iteration. At the same time, I feel that some of the hoops one has to jump through (like passing state in function arguments) can adversely affect the readability and elegance of a recursive solution that is trying to take advantage of TCO. In general, I don't like coding source a certain way because "I know" that the compiler will do something "special" if I do. What is "special" today may be not so special (or even worse) tomorrow. I suppose I would rather do the work of translating my recursive solution into an iterative one rather than trying to code my recursive function just right so that the compiler will do TCO for me, if the stack is a concern. With this approach, I can wind up with faster execution and lower amortized memory costs as well. On Sun, May 3, 2009 at 8:30 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
-- Gerald Britton

Gerald Britton wrote:
Stack space is only an issue if you're using recursion to traverse a linear data structure. If the structure is tree-shaped, you'll run out of memory for the tree long before it's deep enough to cause stack problems. I've yet to see a Python function operating on a linear structure that was easier to follow using recursion instead of iteration. -- Greg

On Sun, 3 May 2009 10:30:55 pm Antoine Pitrou wrote:
People spend time trying to speed up general purpose code in the standard library, to save 1% or 2% on benchmarks. I'm sure I don't need to google for examples -- you've been around the Python-Dev list long enough to see plenty of examples, and I for one haven't seen anyone objecting to improving Python's general execution speed. In this case, the speed-up could (in principle) be by up to a factor of five or so, not a mere couple of percent. Recursion in Python is quite slow compared to iteration. Here's a real (albeit trivial) example: the Towers of Hanoi. def move(n, a, b, c): # Version using tail recursion. "Move n disks from needle a to b using c as temporary storage." if n > 0: for t in move(n-1, a, c, b): yield t yield (a, b) for t in move(n-1, c, b, a): yield t def move2(n, a, b, c): # Non-tail recursive version. while n > 0: for t in move2(n-1, a, c, b): yield t yield (a, b) n -= 1 a, c = c, a And in use:
Okay, this toy program isn't exactly a mission-critical application, but it does demonstrate a genuine performance penalty when using recursion. In this case, two recursive calls (one of which is a tail-call) takes nearly 60% more time than a single recursive call in a while loop. But more importantly, execution time is not the only resource that needs to be optimized. Programmer effort is an important resource that many people wish to optimize. Removing tail-call recursion is a simple algorithm quite well suited to be done by the machine, but tedious and sometimes tricky for the average programmer to do correctly. Another is stack space -- hence the relatively low default recursion limit. Anyone who has ever seen "maximum recursion depth exceeded" in real-world code has a real problem which (if the function is tail-recursive) could be fixed by a hypothetical optimizer. -- Steven D'Aprano

Steven D'Aprano <steve@...> writes:
People spend time trying to speed up general purpose code in the standard library, to save 1% or 2% on benchmarks.
Actually, it's quite rare that these patches (those which only yield a 1 or 2% improvement) are accepted, except when they don't make the implementation more complex. Lots of other patches with more significant speedups on micro-benchmarks are refused, too. You can find some of them in the bug tracker.
and I for one haven't seen anyone objecting to improving Python's general execution speed.
Not in the absolute, sure. If someone produces a patch speeding up recursive calls it will be considered with the same criteria as any patch claiming to improve performance (see above). This doesn't mean it will be accepted for sure.
In this case, two recursive calls (one of which is a tail-call) takes nearly 60% more time than a single recursive call in a while loop.
Why do you think recursion has anything to do with it, rather than simply the fact that there are twice more function calls? Besides, I object to the claim that solving the Towers of Hanoï problem is a real-world example ;) That said, it's true that recursive calls are probably costlier than non-recursive ones, due to the fact that only one frame object is cached for each code objects. But removing this limitation shouldn't make the common (non-recursive) case slower, and it shouldn't increase memory consumption too much, so it's not as easy as it seems. Regards Antoine.

On Mon, 4 May 2009 01:16:10 am Antoine Pitrou wrote:
But surely that's the point of removing tail-recursion? Each function call has overhead, and by reducing the number of function calls, you reduce the amount of overhead. Perhaps we're not discussing the same thing? After reading your reply, there does seem to be some confusion (at least in my head) as to what precisely we're talking about. I've seen tail-call optimization described as programatically converting a tail-call recursive function into an equivalent iterative function. I've also seen it referring to a process of minimizing the depth of the function call stack while still making the same number of function calls.
Besides, I object to the claim that solving the Towers of Hanoï problem is a real-world example ;)
Well, I did admit it was a toy :)
I don't think anyone really believes it is easy. Even the people on various blog sites saying it's "easy" are actually saying that a solution which is either fragile, or slow, or both, is easy, and that shouldn't surprise anyone. -- Steven D'Aprano

Hello again, Steven D'Aprano <steve@...> writes:
Well, it's not that obvious. Tail call optimization does not totally remove function calls, it replaces them with a slightly lighter kind of control transfer. It would not save the cost of setting up a new frame object (the function you transfer control to probably has differing constants and local variables), of passing parameters around (Python's rich function call conventions cost some CPU time) and various other bookkeeping. So, TCO would make the cost of the tail call slightly lighter (how much exactly is unknown), but it won't *suppress* it like e.g. inlining would do.
The former sentence is the formulation in theoretical (algorithmic) terms. The latter sentence is how I believe it gets implemented in practice. Basically, at the assembler level, you replace a "CALL + RETURN" sequence with a single "JUMP" instruction. But that JUMP instruction still must do all the work of setting up parameters, initializing a new frame etc. It's not anything like a lightweight intra-function JUMP.
Well, in the sentence you were responding to I was talking about optimizing recursive calls in Python *without* introducing TCO. ;) Regards Antoine.

On Mon, 2009-05-04 at 00:34 +1000, Steven D'Aprano wrote:
This doesn't demonstrate where the issue is. Is it function calls? The tail recursion call makes (eyeball count) twice as many python function calls (not calls into generators) per call to move(2)? Or perhaps something else? Sure, the TCO one is slower, but is the handing of result from generator to generator really the issue? If it is rather something else, it may be that the nested yielding while costing, is not costing disproportionately - and is still a fraction of the cost. python -m timeit 'from foo import do_move, do_move2' 'do_move()' 65535 10 loops, best of 3: 164 msec per loop python -m timeit 'from foo import do_move, do_move2' 'do_move2()' 32768 10 loops, best of 3: 108 msec per loop Half as many function calls, 2/3rds the time. *Much* better measurement than these sketches is needed to say where the issue is. -Rob

On Mon, 4 May 2009 12:34:53 am Steven D'Aprano wrote about removing tail-recursion:
In this case, the speed-up could (in principle) be by up to a factor of five or so, not a mere couple of percent.
Sorry, that "factor of five" is probably bogus. I got that for some comparisons between recursion and iteration, but the speed difference is probably not relevant to the sort of tail call optimizations we're discussing. I will *not* defend my suggestion of 5x faster, however, on the basis of my Towers of Hanoi test, I think 2x faster is conceivable. I found Guido's recent blog post of tail call optimization: http://neopythonic.blogspot.com/2009/04/tail-recursion-elimination.html Worthwhile reading. Also read the commenters: they make some interesting points, such as that tail call optimization is a general technique applicable to more than just recursion. Guido's largest objection to TCO is that it ruins nice stack traces when you get an exception. I must admit I've never understood this argument. Perhaps I'm missing something, but I've never considered the stack trace you get in recursive functions useful. Here's an example:
To me, all those identical "line 3, in spam" lines are just noise. They get in the way of a nice stack trace! What is Guido seeing that I'm not? Hopefully he isn't counting them by hand to see how deep he got into the recursion! I wish there was a way to tell Python to just throw that white noise away and give me the sort of stack trace I get from a loop function. (It's not so bad when there only ten lines, but when there's 1000, you might very well fill your xterm's buffer and lose valuable history.)
Which of course illustrates the point that Guido's recommendation to re-write the recursion as an iterative loop by hand will have the same effect on the stack trace as iteration already does. I haven't heard anyone argue that stack traces in a while or for loop should show you the entire history of the loop, so I wonder why recursive calls should be treated as sacrosanct? (I have nothing to say about Guido's other arguments against TCO at this point, but my silence should not be interpreted as agreement.) -- Steven D'Aprano

Your comment on the iterative solution also erasing the stack trace is a good one. Insofar as the specific problem, I've always thought maybe ... notation could capture recursive stack traces easier... Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in spam ... File "<stdin>", line 2, in spam ValueError: Nobody expects the Spanish Inquisition! although I believe that particular syntax is used in Doctest (although would it be incorrect in this case?). The same could be used in a 'continue' statement, indicating the first one or two functions and the last one or two functions, similar to sequences described in math (1,2,...,n-1,n). Anywho, I'm no expert on stack traces so if someone else had a better idea I'm all ears. I'd also like to agree with your statement that TCO is not just about execution speed. In fact, to clear up this whole argument, lets go ahead and claim that TCO code will never be faster than the iterative version of things. TCO will prevent things from blowing the stack. And despite many of the cases presented, there are some things that CAN'T be done iteratively, simply, as the commenters to the post you linked to pointed out. Mutual recursion is one, as the iterative solution to that begins to grow in size and use some more complex looping logic. Actual continuation style, where you're making tail-calls to OTHER functions, can only be represented by a trampoline. Which, as I've stated before, is hardly a simpler solution. The complexity of the iterative solution simply goes up from there. I want to clarify that the original suggestion was never to implement TCO 'implicitly'. I don't want to have to 'write my function just right' to get TCO to work. This is why the keyword was suggested, as it'd be an explicit way to tell the interpreter 'this is a tail-call'. If it's not, then throw an exception. Otherwise, there's no optimizations going on behind the scenes that I'm not aware of, which is the case in languages that just turn tail calls optimized behind the scenes. I agree completely with Guido that this can confuse newcomers. So to clarify (and don't get me wrong, this is a very interesting conversation :), the proposal is to add explicit and unambiguous support for the elimination of tail calls (via the use of a keyword), rather than an implicit, confusing optimization behind the scenes, and to do so not to increase runtime performance (per say) but instead to allow techniques which currently blow the stack to work. On Sun, May 3, 2009 at 10:29 AM, Steven D'Aprano <steve@pearwood.info> wrote:

Back to the bike shed argument, the use of 'continue' is incidental. I personally like it, but I totally can see the arguments brought against it. My main thrust is for explicit TCO in Python, to allow for different models of computation in an explicit, Pythonic way. I believe others have suggested making 'yield from' require tail calls, or introducing 'yield as'. Both have their ups and downs. I'd suggest we debate whether or not the original idea, a TCO keyword in python, separate from what that keyword ought to be. It's no use talking about keywords if the original idea is bunk, and I believe the second argument can be won far more pragmatically than the first, which requires some careful thought on all of our parts. On Sun, May 3, 2009 at 11:06 AM, Steven D'Aprano <steve@pearwood.info> wrote:

2009/5/3 Steven D'Aprano <steve@pearwood.info>:
Guido's largest objection to TCO is that it ruins nice stack traces when you get an exception. I must admit I've never understood this argument.
Tail call optimization is more general than tail recursion optimization and it can indeed eliminates useful context from the call stack if the caller and callee are different functions. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

Arnaud Delobelle wrote:
Right, but if you have more than one tail call, you lose the history of _which_ recursive call was taken. If you are saying, "Only a single tail-recursive call is allowed per function," then you are eliminating solution by cases. Think: def recurse(source): if branching(source): if goleft(source): return recurse(source[0]) else: return recurse(source[1:]) return source --Scott David Daniels Scott.Daniels@Acm.Org

On 5/3/09, Antoine Pitrou <solipsis@pitrou.net> wrote:
Steven D'Aprano <steve@...> writes:
Is there some evidence that this "known issue" has been popping up in real-world production Python code
(1) The overhead of function calls (which are more common in a recursive style) is a real problem. That said, it isn't clear that Tail Call Optimization would make enough of a dent on the cost of calling a function. (2) The direct efficiency losses (particularly in space) show up every time an exception is thrown for excessive recursion depth. You could argue that this isn't common in production code, but that is because people learn to distort their code to avoid that limit. The counter-argument is that sometimes excessive depth is caused by a bug, and finding out sooner is a good thing. Guido (and Greg, earlier in this thread) have said that hitting the recursion limit is *usually* a bug. If that were true (and it might be), then tail call optimization *in python* would usually just mean "When I write an infinite loop, take out the breakpoints and debugging information." -jJ

On Sun, May 3, 2009 at 3:02 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Just tossing a wild idea out there... is it conceivable to build a decorator that optimizes a tail-call recursive function?
Yes, check http://code.activestate.com/recipes/496691/. The caveat is that for "small" values it actually pessimizes rather than optimizes the function ("Note also that these decorators are not optimizing and for small argument values they are actually far slower.") George

On Sun, May 3, 2009 at 1:02 AM, Steven D'Aprano <steve@pearwood.info> wrote:
If you need a tail recursive algorithm, you need it to have O(1) space consumption. Constant overhead from each call is minor in comparison. We wouldn't even have this discussion if constant overhead was given by itself, but we would have it if O(1) space consumption was given by itself.
* Used in some LISP implementations, a trampoline is a loop that iteratively invokes thunk-returning functions. A single trampoline is sufficient to express all control transfers of a program; a program so expressed is trampolined or in "trampolined style"; converting a program to trampolined style is trampolining. Trampolined functions can be used to implement tail recursive function calls in stack-oriented languages.[1] For example, something like this (but obviously with arguments and the ability to finish looping): def a(): return b def b(): return a def trampoline(func): while True: func = func()
Yes, there's a lot of weird bytecode hacks out there.
One unfortunate point of confusion is that giving up debugging information is only an issue if you want to do it *by default*, applying it to the entire language. In contrast, any infinite loop should use O(1) space, and therefor cannot retain any debugging information. -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
On Sun, May 3, 2009 at 1:02 AM, Steven D'Aprano <steve@pearwood.info> wrote:
I was pretty much in the same position also.
Aha. This is a restatement of the decades-old fact that a while loop and goto are sufficient to implement any flow chart.
For example, something like this (but obviously with arguments
Or a global state structure.
and the ability to finish looping):
Not sure what you mean, unless a way to exit the mainloop.
So the modern version is functions returning the next function to call instead of code-chunks setting the next label to jump to. Of course, that means that one can match the worst spaghetti-code mess with this mechanism also ;-). Thanks for a clear explanation. tjr

Adam Olsen wrote:
Every once in a while (usually with actor and message passing algorithms) I get the urge to have something like this, I didn't realize it had a formal name: def a(n): trampoline b(n-1) print "never gets printed" def b(n): if cond(n): return n else: trampoline a(n * 2) print "never gets printed" So a trampolined call bails out of the current call frame, and whatever the trampolined call returns, it returns. >>> a(5) == b(4) True >>> The same thing would happen with generators and try/except handlers: def a(n): try: yield n trampoline b(n-1) except Exception, err: print "a() error:", err def b(n): yield n raise RuntimeError, "snort" >>> for i in a(2): ... print i ... 2 1 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in b RuntimeError: snort >>> Joel
participants (18)
-
Aahz
-
Adam Olsen
-
Antoine Pitrou
-
Arnaud Delobelle
-
Carl Johnson
-
George Sakkis
-
Gerald Britton
-
Greg Ewing
-
Jim Jewett
-
Joel Bender
-
John Graham
-
Marcin 'Qrczak' Kowalczyk
-
Robert Collins
-
Scott David Daniels
-
spir
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Terry Reedy