[pypy-dev] Cross-post from python-ideas: Compressing the stack on the fly

Mon May 27 12:39:01 CEST 2013

Hi guys,

I made a post on the Python-ideas mailing list that I was told might be
relevant to Pypy. I've reproduced the original email below. Here is the
thread on Python-ideas with all the
discussion.<https://groups.google.com/forum/?fromgroups#!topic/python-ideas/hteGSNTyC_4>

--------------

Hi everybody,

Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to
how programming languages are implemented, so this idea will most likely be
either (a) completely impossible or (b) trivial knowledge.

I was thinking about the implementation of the factorial in Python. I was
comparing in my mind 2 different solutions: The recursive one, and the one
that uses a loop. Here are example implementations for them:

def factorial_recursive(n):
    if n == 1:
        return 1
    return n * factorial_recursive(n - 1)

def factorial_loop(n):
    result = 1
    for i in range(1, n + 1):
        result *= i
    return result

I know that the recursive one is problematic, because it's putting a lot of
items on the stack. In fact it's using the stack as if it was a loop
variable. The stack wasn't meant to be used like that.

Then the question came to me, why? Maybe the stack could be built to handle
this kind of (ab)use?

I read about tail-call optimization on Wikipedia. If I understand
correctly, the gist of it is that the interpreter tries to recognize, on a
frame-by-frame basis, which frames could be completely eliminated, and then
it eliminates those. Then I read Guido's blog post explaining why he
doesn't want it in Python. In that post he outlined 4 different reasons why
TCO shouldn't be implemented in Python.

But then I thought, maybe you could do something smarter than eliminating
individual stack frames. Maybe we could create something that is to the
current implementation of the stack what `xrange` is to the old-style
`range`. A smart object that allows access to any of a long list of items
in it, without actually having to store those items. This would solve the
first argument that Guido raises in his post, which I found to be the most
substantial one.

What I'm saying is: Imagine the stack of the interpreter when it runs the
factorial example above for n=1000. It has around 1000 items in it and it's
just about to explode. But then, if you'd look at the contents of that
stack, you'd see it's embarrassingly regular, a compression algorithm's wet
dream. It's just the same code location over and over again, with a
different value for `n`.

So what I'm suggesting is an algorithm to compress that stack on the fly.
An algorithm that would detect regularities in the stack and instead of
saving each individual frame, save just the pattern. Then, there wouldn't
be any problem with showing informative stack trace: Despite not storing
every individual frame, each individual frame could still be accessed,
similarly to how `xrange` allow access to each individual member without
having to store each of them.

Then, the stack could store a lot more items, and tasks that currently
require recursion (like pickling using the standard library) will be able
to handle much deeper recursions.

What do you think?

Ram.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20130527/5723bc1f/attachment.html>