[pypy-dev] Cross-post from python-ideas: Compressing the stack on the fly

Dima Tisnek dimaqq at gmail.com
Mon Jul 28 17:57:46 CEST 2014


A slightly detached view (please correct me I'm wrong):

Most stack frames are populated with very few items compared to
allocated block (validation needed)
That's awesome for compression.

However, most items on the stack are references, i.e. pointers to
memory locations. Some of these repeat, e.g. `self`, though I would
suspect for most interesting tasks (e.g. traversing a graph), most
references are disparate, if not unique.
Granted, unique pointers still typically have compressible prefix.

IMO CPython has a potential to be a cache trasher -- smallest frame I
could find is 448 bytes, the LCM to 1K is 7K, in other words 16
recursions and values are aligned in the same slots relative to 1024K,
and your 1-way assoc. cache is done; twice that and 2-way assoc. cache
is done. May be completely masked by other memory accesses these frame
do though, so don't quote me on the performance hit here.

I suggest you capture a few deep stacks in problems from different
domains and present your findings.

(*) I've no idea how pypy implments stack

On 27 May 2013 12:39, Ram Rachum <ram at rachum.com> wrote:
> Hi guys,
>
> I made a post on the Python-ideas mailing list that I was told might be
> relevant to Pypy. I've reproduced the original email below. Here is the
> thread on Python-ideas with all the discussion.
>
> --------------
>
> Hi everybody,
>
> Here's an idea I had a while ago. Now, I'm an ignoramus when it comes to how
> programming languages are implemented, so this idea will most likely be
> either (a) completely impossible or (b) trivial knowledge.
>
> I was thinking about the implementation of the factorial in Python. I was
> comparing in my mind 2 different solutions: The recursive one, and the one
> that uses a loop. Here are example implementations for them:
>
> def factorial_recursive(n):
>     if n == 1:
>         return 1
>     return n * factorial_recursive(n - 1)
>
> def factorial_loop(n):
>     result = 1
>     for i in range(1, n + 1):
>         result *= i
>     return result
>
> I know that the recursive one is problematic, because it's putting a lot of
> items on the stack. In fact it's using the stack as if it was a loop
> variable. The stack wasn't meant to be used like that.
>
> Then the question came to me, why? Maybe the stack could be built to handle
> this kind of (ab)use?
>
> I read about tail-call optimization on Wikipedia. If I understand correctly,
> the gist of it is that the interpreter tries to recognize, on a
> frame-by-frame basis, which frames could be completely eliminated, and then
> it eliminates those. Then I read Guido's blog post explaining why he doesn't
> want it in Python. In that post he outlined 4 different reasons why TCO
> shouldn't be implemented in Python.
>
> But then I thought, maybe you could do something smarter than eliminating
> individual stack frames. Maybe we could create something that is to the
> current implementation of the stack what `xrange` is to the old-style
> `range`. A smart object that allows access to any of a long list of items in
> it, without actually having to store those items. This would solve the first
> argument that Guido raises in his post, which I found to be the most
> substantial one.
>
> What I'm saying is: Imagine the stack of the interpreter when it runs the
> factorial example above for n=1000. It has around 1000 items in it and it's
> just about to explode. But then, if you'd look at the contents of that
> stack, you'd see it's embarrassingly regular, a compression algorithm's wet
> dream. It's just the same code location over and over again, with a
> different value for `n`.
>
> So what I'm suggesting is an algorithm to compress that stack on the fly. An
> algorithm that would detect regularities in the stack and instead of saving
> each individual frame, save just the pattern. Then, there wouldn't be any
> problem with showing informative stack trace: Despite not storing every
> individual frame, each individual frame could still be accessed, similarly
> to how `xrange` allow access to each individual member without having to
> store each of them.
>
> Then, the stack could store a lot more items, and tasks that currently
> require recursion (like pickling using the standard library) will be able to
> handle much deeper recursions.
>
> What do you think?
>
>
> Ram.
>
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev
>


More information about the pypy-dev mailing list