Cross-post from python-ideas: Compressing the stack on the fly

Hi Ram, That is a daring idea, really. But since you asked it, I will tell you what I think. I think it is a bad idea. First of all, the complexity of implementation. I see two 'obvious' implementations of your idea, which is a): an 'on-line' stack compressor, which will slow down functions calls farther than they are already (in CPython, anyway), or b): a 'just-in-time' stack compressor that is initiated when the 1000th stack frame is reached. I can imagine this happening in-place, but it won't be efficient. Now consider what happens when an exception is raised from the bottom to the top.Or worse, from the bottom to somewhere-in-the-middle. The second point concerns the possible gains. Suppose your recursive factorial stack is compressed. At the very least, any compression algorithm must store the decreasing integer that is multiplied. (You might get away with detecting the pattern, but don't count on it). At best, you might run your recursive algorithm a bit longer, but it will overflow eventually. In other words, on a machine with a finite stack size, the algorithm is *wrong*. The correct recursive implementation looks like this: def factorial(x): def recursive(x, c): if x <= 1: return c return recursive(x-1, x * c) return recursive(x, 1) Which doesn't need to store the decreasing x on any stack, and is thus a prime candidate for TCO. The third point concerns the reason why python does not have TCO in the first place. I've read Guido's blog as well, and in my opinion, he's wrong to make such a distinction between what are essentially nearly identical processes: jumping to new code locations. As it is, however, he's dictator. In python, you're simply not /supposed/ to use deep recursive algorithms. It is considered un-pythonic. Nevertheless, I like the style of your idea :-). Kind regards, Bart Wiegmans 2013/5/28 <pypy-dev-request@python.org>:
Send pypy-dev mailing list submissions to pypy-dev@python.org
To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/pypy-dev or, via email, send a message with subject or body 'help' to pypy-dev-request@python.org
You can reach the person managing the list at pypy-dev-owner@python.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of pypy-dev digest..."
Today's Topics:
1. Cross-post from python-ideas: Compressing the stack on the fly (Ram Rachum) 2. pypy (Samir Tigrine)
----------------------------------------------------------------------
Message: 1 Date: Mon, 27 May 2013 13:39:01 +0300 From: Ram Rachum <ram@rachum.com> To: pypy-dev@python.org Subject: [pypy-dev] Cross-post from python-ideas: Compressing the stack on the fly Message-ID: <CANXboVaE_HsQt0CQhJwBwNOL8F+qPzCLbZZsdoLLQZvY4nSx=A@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1"
Hi guys,
I made a post on the Python-ideas mailing list that I was told might be relevant to Pypy. I've reproduced the original email below. Here is the thread on Python-ideas with all the discussion.<https://groups.google.com/forum/?fromgroups#!topic/python-ideas/hteGSNTyC_4>
--------------
Hi everybody,
Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to how programming languages are implemented, so this idea will most likely be either (a) completely impossible or (b) trivial knowledge.
I was thinking about the implementation of the factorial in Python. I was comparing in my mind 2 different solutions: The recursive one, and the one that uses a loop. Here are example implementations for them:
def factorial_recursive(n): if n == 1: return 1 return n * factorial_recursive(n - 1)
def factorial_loop(n): result = 1 for i in range(1, n + 1): result *= i return result
I know that the recursive one is problematic, because it's putting a lot of items on the stack. In fact it's using the stack as if it was a loop variable. The stack wasn't meant to be used like that.
Then the question came to me, why? Maybe the stack could be built to handle this kind of (ab)use?
I read about tail-call optimization on Wikipedia. If I understand correctly, the gist of it is that the interpreter tries to recognize, on a frame-by-frame basis, which frames could be completely eliminated, and then it eliminates those. Then I read Guido's blog post explaining why he doesn't want it in Python. In that post he outlined 4 different reasons why TCO shouldn't be implemented in Python.
But then I thought, maybe you could do something smarter than eliminating individual stack frames. Maybe we could create something that is to the current implementation of the stack what `xrange` is to the old-style `range`. A smart object that allows access to any of a long list of items in it, without actually having to store those items. This would solve the first argument that Guido raises in his post, which I found to be the most substantial one.
What I'm saying is: Imagine the stack of the interpreter when it runs the factorial example above for n=1000. It has around 1000 items in it and it's just about to explode. But then, if you'd look at the contents of that stack, you'd see it's embarrassingly regular, a compression algorithm's wet dream. It's just the same code location over and over again, with a different value for `n`.
So what I'm suggesting is an algorithm to compress that stack on the fly. An algorithm that would detect regularities in the stack and instead of saving each individual frame, save just the pattern. Then, there wouldn't be any problem with showing informative stack trace: Despite not storing every individual frame, each individual frame could still be accessed, similarly to how `xrange` allow access to each individual member without having to store each of them.
Then, the stack could store a lot more items, and tasks that currently require recursion (like pickling using the standard library) will be able to handle much deeper recursions.
What do you think?
Ram.

Thanks for your critique, Bart. One counter-point I'd like to make to your third argument, which I had to make over at python-ideas as well: I am *not* advocating recursive programming. There is no need to convince me that the loop version of the algorithm is superior to the recursive version. The reason I care about optimizing recursive algorithms is because sometimes these algorithms are * forced* on you, and when they are, you want them to be as efficient as possible. There's the example of the `pickle` module. In Python, pickling is recursive. I had a case where a program of mine failed to run because it involved pickling an object that referenced a large number of small objects that referenced each other. *I had no choice except using recursion, because Python's `pickle` uses recursion.* (Except of course writing my own pickle module...) So I want recursive algorithm to be faster not because I'd like to use them, but because I want them to be faster when I'm *forced *to use them. On Tue, May 28, 2013 at 3:12 PM, Bart Wiegmans <bartwiegmans@gmail.com>wrote:
Hi Ram,
That is a daring idea, really. But since you asked it, I will tell you what I think.
I think it is a bad idea.
First of all, the complexity of implementation. I see two 'obvious' implementations of your idea, which is a): an 'on-line' stack compressor, which will slow down functions calls farther than they are already (in CPython, anyway), or b): a 'just-in-time' stack compressor that is initiated when the 1000th stack frame is reached. I can imagine this happening in-place, but it won't be efficient. Now consider what happens when an exception is raised from the bottom to the top.Or worse, from the bottom to somewhere-in-the-middle.
The second point concerns the possible gains. Suppose your recursive factorial stack is compressed. At the very least, any compression algorithm must store the decreasing integer that is multiplied. (You might get away with detecting the pattern, but don't count on it). At best, you might run your recursive algorithm a bit longer, but it will overflow eventually. In other words, on a machine with a finite stack size, the algorithm is *wrong*. The correct recursive implementation looks like this:
def factorial(x): def recursive(x, c): if x <= 1: return c return recursive(x-1, x * c) return recursive(x, 1)
Which doesn't need to store the decreasing x on any stack, and is thus a prime candidate for TCO.
The third point concerns the reason why python does not have TCO in the first place. I've read Guido's blog as well, and in my opinion, he's wrong to make such a distinction between what are essentially nearly identical processes: jumping to new code locations. As it is, however, he's dictator. In python, you're simply not /supposed/ to use deep recursive algorithms. It is considered un-pythonic.
Nevertheless, I like the style of your idea :-).
Kind regards, Bart Wiegmans
Send pypy-dev mailing list submissions to pypy-dev@python.org
To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/pypy-dev or, via email, send a message with subject or body 'help' to pypy-dev-request@python.org
You can reach the person managing the list at pypy-dev-owner@python.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of pypy-dev digest..."
Today's Topics:
1. Cross-post from python-ideas: Compressing the stack on the fly (Ram Rachum) 2. pypy (Samir Tigrine)
----------------------------------------------------------------------
Message: 1 Date: Mon, 27 May 2013 13:39:01 +0300 From: Ram Rachum <ram@rachum.com> To: pypy-dev@python.org Subject: [pypy-dev] Cross-post from python-ideas: Compressing the stack on the fly Message-ID: <CANXboVaE_HsQt0CQhJwBwNOL8F+qPzCLbZZsdoLLQZvY4nSx= A@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1"
Hi guys,
I made a post on the Python-ideas mailing list that I was told might be relevant to Pypy. I've reproduced the original email below. Here is the thread on Python-ideas with all the discussion.< https://groups.google.com/forum/?fromgroups#!topic/python-ideas/hteGSNTyC_4
--------------
Hi everybody,
Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to how programming languages are implemented, so this idea will most likely be either (a) completely impossible or (b) trivial knowledge.
I was thinking about the implementation of the factorial in Python. I was comparing in my mind 2 different solutions: The recursive one, and the one that uses a loop. Here are example implementations for them:
def factorial_recursive(n): if n == 1: return 1 return n * factorial_recursive(n - 1)
def factorial_loop(n): result = 1 for i in range(1, n + 1): result *= i return result
I know that the recursive one is problematic, because it's putting a lot of items on the stack. In fact it's using the stack as if it was a loop variable. The stack wasn't meant to be used like that.
Then the question came to me, why? Maybe the stack could be built to handle this kind of (ab)use?
I read about tail-call optimization on Wikipedia. If I understand correctly, the gist of it is that the interpreter tries to recognize, on a frame-by-frame basis, which frames could be completely eliminated, and
2013/5/28 <pypy-dev-request@python.org>: then
it eliminates those. Then I read Guido's blog post explaining why he doesn't want it in Python. In that post he outlined 4 different reasons why TCO shouldn't be implemented in Python.
But then I thought, maybe you could do something smarter than eliminating individual stack frames. Maybe we could create something that is to the current implementation of the stack what `xrange` is to the old-style `range`. A smart object that allows access to any of a long list of items in it, without actually having to store those items. This would solve the first argument that Guido raises in his post, which I found to be the most substantial one.
What I'm saying is: Imagine the stack of the interpreter when it runs the factorial example above for n=1000. It has around 1000 items in it and it's just about to explode. But then, if you'd look at the contents of that stack, you'd see it's embarrassingly regular, a compression algorithm's wet dream. It's just the same code location over and over again, with a different value for `n`.
So what I'm suggesting is an algorithm to compress that stack on the fly. An algorithm that would detect regularities in the stack and instead of saving each individual frame, save just the pattern. Then, there wouldn't be any problem with showing informative stack trace: Despite not storing every individual frame, each individual frame could still be accessed, similarly to how `xrange` allow access to each individual member without having to store each of them.
Then, the stack could store a lot more items, and tasks that currently require recursion (like pickling using the standard library) will be able to handle much deeper recursions.
What do you think?
Ram.

On Tue, May 28, 2013 at 2:24 PM, Ram Rachum <ram@rachum.com> wrote:
Thanks for your critique, Bart.
One counter-point I'd like to make to your third argument, which I had to make over at python-ideas as well: I am not advocating recursive programming. There is no need to convince me that the loop version of the algorithm is superior to the recursive version. The reason I care about optimizing recursive algorithms is because sometimes these algorithms are forced on you, and when they are, you want them to be as efficient as possible.
There's the example of the `pickle` module. In Python, pickling is recursive. I had a case where a program of mine failed to run because it involved pickling an object that referenced a large number of small objects that referenced each other. I had no choice except using recursion, because Python's `pickle` uses recursion. (Except of course writing my own pickle module...)
So I want recursive algorithm to be faster not because I'd like to use them, but because I want them to be faster when I'm forced to use them.
stackless does allow you to have deep recursion by copying the stack away FYI.
On Tue, May 28, 2013 at 3:12 PM, Bart Wiegmans <bartwiegmans@gmail.com> wrote:
Hi Ram,
That is a daring idea, really. But since you asked it, I will tell you what I think.
I think it is a bad idea.
First of all, the complexity of implementation. I see two 'obvious' implementations of your idea, which is a): an 'on-line' stack compressor, which will slow down functions calls farther than they are already (in CPython, anyway), or b): a 'just-in-time' stack compressor that is initiated when the 1000th stack frame is reached. I can imagine this happening in-place, but it won't be efficient. Now consider what happens when an exception is raised from the bottom to the top.Or worse, from the bottom to somewhere-in-the-middle.
The second point concerns the possible gains. Suppose your recursive factorial stack is compressed. At the very least, any compression algorithm must store the decreasing integer that is multiplied. (You might get away with detecting the pattern, but don't count on it). At best, you might run your recursive algorithm a bit longer, but it will overflow eventually. In other words, on a machine with a finite stack size, the algorithm is *wrong*. The correct recursive implementation looks like this:
def factorial(x): def recursive(x, c): if x <= 1: return c return recursive(x-1, x * c) return recursive(x, 1)
Which doesn't need to store the decreasing x on any stack, and is thus a prime candidate for TCO.
The third point concerns the reason why python does not have TCO in the first place. I've read Guido's blog as well, and in my opinion, he's wrong to make such a distinction between what are essentially nearly identical processes: jumping to new code locations. As it is, however, he's dictator. In python, you're simply not /supposed/ to use deep recursive algorithms. It is considered un-pythonic.
Nevertheless, I like the style of your idea :-).
Kind regards, Bart Wiegmans
2013/5/28 <pypy-dev-request@python.org>:
Send pypy-dev mailing list submissions to pypy-dev@python.org
To subscribe or unsubscribe via the World Wide Web, visit http://mail.python.org/mailman/listinfo/pypy-dev or, via email, send a message with subject or body 'help' to pypy-dev-request@python.org
You can reach the person managing the list at pypy-dev-owner@python.org
When replying, please edit your Subject line so it is more specific than "Re: Contents of pypy-dev digest..."
Today's Topics:
1. Cross-post from python-ideas: Compressing the stack on the fly (Ram Rachum) 2. pypy (Samir Tigrine)
----------------------------------------------------------------------
Message: 1 Date: Mon, 27 May 2013 13:39:01 +0300 From: Ram Rachum <ram@rachum.com> To: pypy-dev@python.org Subject: [pypy-dev] Cross-post from python-ideas: Compressing the stack on the fly Message-ID:
<CANXboVaE_HsQt0CQhJwBwNOL8F+qPzCLbZZsdoLLQZvY4nSx=A@mail.gmail.com> Content-Type: text/plain; charset="iso-8859-1"
Hi guys,
I made a post on the Python-ideas mailing list that I was told might be relevant to Pypy. I've reproduced the original email below. Here is the thread on Python-ideas with all the
discussion.<https://groups.google.com/forum/?fromgroups#!topic/python-ideas/hteGSNTyC_4>
--------------
Hi everybody,
Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to how programming languages are implemented, so this idea will most likely be either (a) completely impossible or (b) trivial knowledge.
I was thinking about the implementation of the factorial in Python. I was comparing in my mind 2 different solutions: The recursive one, and the one that uses a loop. Here are example implementations for them:
def factorial_recursive(n): if n == 1: return 1 return n * factorial_recursive(n - 1)
def factorial_loop(n): result = 1 for i in range(1, n + 1): result *= i return result
I know that the recursive one is problematic, because it's putting a lot of items on the stack. In fact it's using the stack as if it was a loop variable. The stack wasn't meant to be used like that.
Then the question came to me, why? Maybe the stack could be built to handle this kind of (ab)use?
I read about tail-call optimization on Wikipedia. If I understand correctly, the gist of it is that the interpreter tries to recognize, on a frame-by-frame basis, which frames could be completely eliminated, and then it eliminates those. Then I read Guido's blog post explaining why he doesn't want it in Python. In that post he outlined 4 different reasons why TCO shouldn't be implemented in Python.
But then I thought, maybe you could do something smarter than eliminating individual stack frames. Maybe we could create something that is to the current implementation of the stack what `xrange` is to the old-style `range`. A smart object that allows access to any of a long list of items in it, without actually having to store those items. This would solve the first argument that Guido raises in his post, which I found to be the most substantial one.
What I'm saying is: Imagine the stack of the interpreter when it runs the factorial example above for n=1000. It has around 1000 items in it and it's just about to explode. But then, if you'd look at the contents of that stack, you'd see it's embarrassingly regular, a compression algorithm's wet dream. It's just the same code location over and over again, with a different value for `n`.
So what I'm suggesting is an algorithm to compress that stack on the fly. An algorithm that would detect regularities in the stack and instead of saving each individual frame, save just the pattern. Then, there wouldn't be any problem with showing informative stack trace: Despite not storing every individual frame, each individual frame could still be accessed, similarly to how `xrange` allow access to each individual member without having to store each of them.
Then, the stack could store a lot more items, and tasks that currently require recursion (like pickling using the standard library) will be able to handle much deeper recursions.
What do you think?
Ram.
participants (3)
-
Bart Wiegmans
-
Maciej Fijalkowski
-
Ram Rachum