Idea: Compressing the stack on the fly

Hi everybody, Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to how programming languages are implemented, so this idea will most likely be either (a) completely impossible or (b) trivial knowledge. I was thinking about the implementation of the factorial in Python. I was comparing in my mind 2 different solutions: The recursive one, and the one that uses a loop. Here are example implementations for them: def factorial_recursive(n): if n == 1: return 1 return n * factorial_recursive(n - 1) def factorial_loop(n): result = 1 for i in range(1, n + 1): result *= i return result I know that the recursive one is problematic, because it's putting a lot of items on the stack. In fact it's using the stack as if it was a loop variable. The stack wasn't meant to be used like that. Then the question came to me, why? Maybe the stack could be built to handle this kind of (ab)use? I read about tail-call optimization on Wikipedia. If I understand correctly, the gist of it is that the interpreter tries to recognize, on a frame-by-frame basis, which frames could be completely eliminated, and then it eliminates those. Then I read Guido's blog post explaining why he doesn't want it in Python. In that post he outlined 4 different reasons why TCO shouldn't be implemented in Python. But then I thought, maybe you could do something smarter than eliminating individual stack frames. Maybe we could create something that is to the current implementation of the stack what `xrange` is to the old-style `range`. A smart object that allows access to any of a long list of items in it, without actually having to store those items. This would solve the first argument that Guido raises in his post, which I found to be the most substantial one. What I'm saying is: Imagine the stack of the interpreter when it runs the factorial example above for n=1000. It has around 1000 items in it and it's just about to explode. But then, if you'd look at the contents of that stack, you'd see it's embarrassingly regular, a compression algorithm's wet dream. It's just the same code location over and over again, with a different value for `n`. So what I'm suggesting is an algorithm to compress that stack on the fly. An algorithm that would detect regularities in the stack and instead of saving each individual frame, save just the pattern. Then, there wouldn't be any problem with showing informative stack trace: Despite not storing every individual frame, each individual frame could still be *accessed*, similarly to how `xrange` allow access to each individual member without having to store each of them. Then, the stack could store a lot more items, and tasks that currently require recursion (like pickling using the standard library) will be able to handle much deeper recursions. What do you think? Ram.

On 5/26/2013 8:00 AM, Ram Rachum wrote:
This fails for 0 (0! == 1 also), but this is easily fixed. If also fails for negative and non-integral values, but ignore that for the moment. def factorial_tail(n, result=1): if n > 1: return factorial_tail(n-1, result * n) else: return result def factorial_while(n, result=1) # written to maximize sameness with tail version while n > 1: n = n=1 result = result * n) else: return result If one can rewrite the 'body' recursion (my term) as tail recursion, the translation to while loop is trivial, and for linear recursion on counts, the for loop version is simple and ofter even clearer as it explicitly identifies all the values used in the coomputation.
Now consider a production ready function that will always terminate: def factorial(n): if n < 0 or n != int(n): raise ValueError('factorial input must be a count') <body as above> To do same with recursion, you have to write main function as a nested function to avoid useless doing check more than once. Conclusion: for linear processing of collections, use a while or for loop.
<compress stack>
While I do not see much in the idea. * Converting recursion to iteration compresses stack to 1 frame. * Slows down all computation for rare benefit. * Does not really scale very well. In most languages, factorial overflows long before there is a stack problem. To be realistic, consider linearly processing a billion character string with recursion versus iteration. * Recursion does not work with iterators as well as iteration. Iterators are one of Python's crown jewels. with open('terabyte.txt') as f: for c in chars(f): process(c) is the right idiom for independently processing the characters of a terabyte file. -- Terry Jan Reedy

And, besides everybody else's comments, I'd like to point out that tail recursion elimination _is_ feasible in Python as it is today. One can create a decorator meant only for suitable tail-recursion functions, that makes the bookkeeping of whenever the funciton is called - and in this book-keeping make the call to the decorated function inside a try-except block. If it is called reursively, it them raises an apropriate exception, that will make the execution continue on the frame-stack level of the first stack. I have a blog post with such a toy - nevertheless, it is just a toy. (If ther ewas soem small problem that could be elegantly approached in this fashion but not interactively, it could be used in production though) http://metapython.blogspot.com.br/2010/11/tail-recursion-elimination-in-pyth... js -><- On 26 May 2013 09:00, Ram Rachum <ram.rachum@gmail.com> wrote:

On Mon, May 27, 2013 at 9:01 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
What can tail recursion do that can't be done by reassigning to the function parameters and 'goto' back to the top? Or, in the absence of an actual goto, a construct like this: def tail_recursive_function(some_arg): while True: # ... code if call_self: # return tail_recursive_function(some_other_arg) some_arg = some_other_arg continue # ... more code # falling off the end: break which basically amounts to a goto but using extra keywords to avoid the one that people hate. Is there any fundamental difference? I've never understood there to be any, but I'm only one, and possibly I'm wrong. ChrisA

On 5/27/2013 9:30 AM, Chris Angelico wrote:
That style can't handle mutually recursive procedures, or the extreme case: a state machine implemented with N functions, each of which calls the next state function at the end. Tail-call elimination isn't simply about noticing recursive calls. It's about noticing that a function ends with a function call, and not burning another stack frame in the process. --Ned.

On May 27, 2013, at 7:05, Chris Angelico <rosuav@gmail.com> wrote:
That's exactly why I suggested that it would be more interesting in a PyPy-style tracing JIT than in a static compiler. At runtime, you can detect that your traced fast path has a FUNC opcode that calls a function whose body is already part of the trace, and turn that into code that stashes just enough information to generate stack frames if needed, then do a goto. In simple cases, that stashing could amount to no more than incrementing a counter; in complex cases, I think it might be closely related to stashing enough info to create generator states if you fall off the fast path. That should be much simpler than trying to detect patterns that will lead to tail calls. And it means you only make the effort when it makes a difference. But, most importantly, it means you can benefit even in non-tail-call cases, like the naive recursion in the original post. Even when you can't eliminate recursion you will still "compress" the state. And it might also let you optimize chains of generator yields or coroutine sends with not much extra effort (which might even be a basis for optimizing out explicit coroutine trampolines, which would be very cool). This is pretty different from traditional TCO, but given that TCO isn't relevant to the OP's proposal or to his example, I don't think that's a problem.

You can just use a memoize decorator to automatically convert your recursive solution to a fast linear time one. Search for "python memoization decorator". This would make a much broader range of recursive solution run in linear time. Best, Neil On Sunday, May 26, 2013 8:00:13 AM UTC-4, Ram Rachum wrote:

-----Original Message----- From: Westley Martínez [mailto:anikom15@gmail.com] Sent: Wednesday, September 11, 2013 9:03 PM To: 'Ram Rachum'; 'python-ideas@googlegroups.com' Cc: 'Ram Rachum' Subject: RE: [Python-ideas] Idea: Compressing the stack on the fly
I think this is an interesting idea. It sounds possible, but the question is whether or not it can be efficiently done with Python. I'd heed Guido's advice in first implementing this. It could probably be done effectively with a compiled language like C, but I'd imagine it'd be too difficult for Python. The other question is usability. What would this actually be used for. I'm not a fan of recursion. I think anything that uses recursion could be restructured into something simpler. A lot of people find recursion to be elegant. For me it just hurts my brain.

Does anyone actually write recursive Python code where the recursion in a significant bottleneck? The only such code I can think of is either for a tree, in which case stack depth is irrelevant, or bad code. Why would anyone care, basically?

This sounds like something that the PyPy team might be interested in.

On 12.09.2013 06:29, Joshua Landau wrote:
Any kind of backtracking algorithm will need recursion or a separate stack data structure to keep track of the various decisions made up to a certain point on the path. The C stack is rather limited in size, so a recursive parser can easily blow up if it uses the C stack alone for managing backtracking. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2013)
2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 8 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 12.09.2013 09:03, Joshua Landau wrote:
Oh, that's easy. It just depends on the given data set that you're working on and how often you have to branch when working on it. http://en.wikipedia.org/wiki/Backtracking lists a few problems. Here's a regular expression example that would blow the stack, if the re module were still using it (it was fixed in 2003 to no longer do): re.match('(.*a|.*b|x)+', 'x' * 100000) The expression still uses exponential time, though. With Python 2.3, you see the stack limit error: Python 2.3.5 (#1, Aug 24 2011, 15:52:42) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2 Type "help", "copyright", "credits" or "license" for more information.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2013)
2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 8 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 12 September 2013 05:29, Joshua Landau <joshua@landau.ws> wrote:
I think you're asking this question the wrong way. Recursion isn't a bottleneck that slows down your program. When you hit the recursion limit your program just blows up. Since Python doesn't have the optimisations that make any particular kind of recursion scale well people do not generally use it unless they know that the depth is small enough. Currently code that is susceptible to hitting the recursion limit is "bad code" because it depends on optimisations that don't exist. However, if the optimisations did exist then people could choose to take advantage of them. As an example, I once implemented Tarjan's algorithm in Python using the recursive form shown here: http://en.wikipedia.org/wiki/Tarjan's_strongly_connected_components_algorithm#The_algorithm_in_pseudocode After implementing it and confirming that it worked I immediately found that it hit the recursion limit in my real problem. So I reimplemented it without the recursion. Had there been optimisations that would have made the reimplementation unnecessary I would have happily stuck with the first form since it was easier to understand than the explicit stack of iterators version that I ended up with. For the same reasons you won't see much code out there where recursion is a bottleneck unless, as you say, it is "bad code". Oscar

On 5/26/2013 8:00 AM, Ram Rachum wrote:
This fails for 0 (0! == 1 also), but this is easily fixed. If also fails for negative and non-integral values, but ignore that for the moment. def factorial_tail(n, result=1): if n > 1: return factorial_tail(n-1, result * n) else: return result def factorial_while(n, result=1) # written to maximize sameness with tail version while n > 1: n = n=1 result = result * n) else: return result If one can rewrite the 'body' recursion (my term) as tail recursion, the translation to while loop is trivial, and for linear recursion on counts, the for loop version is simple and ofter even clearer as it explicitly identifies all the values used in the coomputation.
Now consider a production ready function that will always terminate: def factorial(n): if n < 0 or n != int(n): raise ValueError('factorial input must be a count') <body as above> To do same with recursion, you have to write main function as a nested function to avoid useless doing check more than once. Conclusion: for linear processing of collections, use a while or for loop.
<compress stack>
While I do not see much in the idea. * Converting recursion to iteration compresses stack to 1 frame. * Slows down all computation for rare benefit. * Does not really scale very well. In most languages, factorial overflows long before there is a stack problem. To be realistic, consider linearly processing a billion character string with recursion versus iteration. * Recursion does not work with iterators as well as iteration. Iterators are one of Python's crown jewels. with open('terabyte.txt') as f: for c in chars(f): process(c) is the right idiom for independently processing the characters of a terabyte file. -- Terry Jan Reedy

And, besides everybody else's comments, I'd like to point out that tail recursion elimination _is_ feasible in Python as it is today. One can create a decorator meant only for suitable tail-recursion functions, that makes the bookkeeping of whenever the funciton is called - and in this book-keeping make the call to the decorated function inside a try-except block. If it is called reursively, it them raises an apropriate exception, that will make the execution continue on the frame-stack level of the first stack. I have a blog post with such a toy - nevertheless, it is just a toy. (If ther ewas soem small problem that could be elegantly approached in this fashion but not interactively, it could be used in production though) http://metapython.blogspot.com.br/2010/11/tail-recursion-elimination-in-pyth... js -><- On 26 May 2013 09:00, Ram Rachum <ram.rachum@gmail.com> wrote:

On Mon, May 27, 2013 at 9:01 PM, Joao S. O. Bueno <jsbueno@python.org.br> wrote:
What can tail recursion do that can't be done by reassigning to the function parameters and 'goto' back to the top? Or, in the absence of an actual goto, a construct like this: def tail_recursive_function(some_arg): while True: # ... code if call_self: # return tail_recursive_function(some_other_arg) some_arg = some_other_arg continue # ... more code # falling off the end: break which basically amounts to a goto but using extra keywords to avoid the one that people hate. Is there any fundamental difference? I've never understood there to be any, but I'm only one, and possibly I'm wrong. ChrisA

On 5/27/2013 9:30 AM, Chris Angelico wrote:
That style can't handle mutually recursive procedures, or the extreme case: a state machine implemented with N functions, each of which calls the next state function at the end. Tail-call elimination isn't simply about noticing recursive calls. It's about noticing that a function ends with a function call, and not burning another stack frame in the process. --Ned.

On May 27, 2013, at 7:05, Chris Angelico <rosuav@gmail.com> wrote:
That's exactly why I suggested that it would be more interesting in a PyPy-style tracing JIT than in a static compiler. At runtime, you can detect that your traced fast path has a FUNC opcode that calls a function whose body is already part of the trace, and turn that into code that stashes just enough information to generate stack frames if needed, then do a goto. In simple cases, that stashing could amount to no more than incrementing a counter; in complex cases, I think it might be closely related to stashing enough info to create generator states if you fall off the fast path. That should be much simpler than trying to detect patterns that will lead to tail calls. And it means you only make the effort when it makes a difference. But, most importantly, it means you can benefit even in non-tail-call cases, like the naive recursion in the original post. Even when you can't eliminate recursion you will still "compress" the state. And it might also let you optimize chains of generator yields or coroutine sends with not much extra effort (which might even be a basis for optimizing out explicit coroutine trampolines, which would be very cool). This is pretty different from traditional TCO, but given that TCO isn't relevant to the OP's proposal or to his example, I don't think that's a problem.

You can just use a memoize decorator to automatically convert your recursive solution to a fast linear time one. Search for "python memoization decorator". This would make a much broader range of recursive solution run in linear time. Best, Neil On Sunday, May 26, 2013 8:00:13 AM UTC-4, Ram Rachum wrote:

-----Original Message----- From: Westley Martínez [mailto:anikom15@gmail.com] Sent: Wednesday, September 11, 2013 9:03 PM To: 'Ram Rachum'; 'python-ideas@googlegroups.com' Cc: 'Ram Rachum' Subject: RE: [Python-ideas] Idea: Compressing the stack on the fly
I think this is an interesting idea. It sounds possible, but the question is whether or not it can be efficiently done with Python. I'd heed Guido's advice in first implementing this. It could probably be done effectively with a compiled language like C, but I'd imagine it'd be too difficult for Python. The other question is usability. What would this actually be used for. I'm not a fan of recursion. I think anything that uses recursion could be restructured into something simpler. A lot of people find recursion to be elegant. For me it just hurts my brain.

Does anyone actually write recursive Python code where the recursion in a significant bottleneck? The only such code I can think of is either for a tree, in which case stack depth is irrelevant, or bad code. Why would anyone care, basically?

This sounds like something that the PyPy team might be interested in.

On 12.09.2013 06:29, Joshua Landau wrote:
Any kind of backtracking algorithm will need recursion or a separate stack data structure to keep track of the various decisions made up to a certain point on the path. The C stack is rather limited in size, so a recursive parser can easily blow up if it uses the C stack alone for managing backtracking. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2013)
2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 8 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 12.09.2013 09:03, Joshua Landau wrote:
Oh, that's easy. It just depends on the given data set that you're working on and how often you have to branch when working on it. http://en.wikipedia.org/wiki/Backtracking lists a few problems. Here's a regular expression example that would blow the stack, if the re module were still using it (it was fixed in 2003 to no longer do): re.match('(.*a|.*b|x)+', 'x' * 100000) The expression still uses exponential time, though. With Python 2.3, you see the stack limit error: Python 2.3.5 (#1, Aug 24 2011, 15:52:42) [GCC 4.5.0 20100604 [gcc-4_5-branch revision 160292]] on linux2 Type "help", "copyright", "credits" or "license" for more information.
-- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Source (#1, Sep 12 2013)
2013-09-11: Released eGenix PyRun 1.3.0 ... http://egenix.com/go49 2013-09-04: Released eGenix pyOpenSSL 0.13.2 ... http://egenix.com/go48 2013-09-20: PyCon UK 2013, Coventry, UK ... 8 days to go eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/

On 12 September 2013 05:29, Joshua Landau <joshua@landau.ws> wrote:
I think you're asking this question the wrong way. Recursion isn't a bottleneck that slows down your program. When you hit the recursion limit your program just blows up. Since Python doesn't have the optimisations that make any particular kind of recursion scale well people do not generally use it unless they know that the depth is small enough. Currently code that is susceptible to hitting the recursion limit is "bad code" because it depends on optimisations that don't exist. However, if the optimisations did exist then people could choose to take advantage of them. As an example, I once implemented Tarjan's algorithm in Python using the recursive form shown here: http://en.wikipedia.org/wiki/Tarjan's_strongly_connected_components_algorithm#The_algorithm_in_pseudocode After implementing it and confirming that it worked I immediately found that it hit the recursion limit in my real problem. So I reimplemented it without the recursion. Had there been optimisations that would have made the reimplementation unnecessary I would have happily stuck with the first form since it was easier to understand than the explicit stack of iterators version that I ended up with. For the same reasons you won't see much code out there where recursion is a bottleneck unless, as you say, it is "bad code". Oscar
participants (13)
-
Andrew Barnert
-
Chris Angelico
-
Clay Sweetser
-
Joao S. O. Bueno
-
Joshua Landau
-
M.-A. Lemburg
-
Ned Batchelder
-
Neil Girdhar
-
Oscar Benjamin
-
Ram Rachum
-
Serhiy Storchaka
-
Terry Jan Reedy
-
Westley Martínez