[Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)

Phillip J. Eby pje at telecommunity.com
Thu Oct 6 06:47:40 CEST 2005

At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
>(anyone still thinking about removing the block stack?).

I'm not any more.  My thought was that it would be good for performance, by 
reducing the memory allocation overhead for frames enough to allow pymalloc 
to be used instead of the platform malloc.  After more investigation, 
however, I realized that was a dumb idea, because for a typical application 
the amortized allocation cost of frames approaches zero as the program runs 
and allocates as many frames as it will ever use, as large as it will ever 
use them, and just recycles them on the free list.  And all of the ways I 
came up with for removing the block stack were a lot more complex than 
leaving it as-is.

Clearly, the cost of function calls in Python lies somewhere else, and I'd 
probably look next at parameter tuple allocation, and other frame 
initialization activities.  I seem to recall that Armin Rigo once supplied 
a patch that sped up calls at the cost of slowing down recursive or 
re-entrant ones, and I seem to recall that it was based on preinitializing 
frames, not just preallocating them:


However, the patch was never applied because of its increased memory usage 
as well as the slowdown for recursion.

Every so often, in blue-sky thinking about alternative Python VM designs, I 
think about making frames virtual, in the sense of not even having "real" 
frame objects except for generators, sys._getframe(), and tracebacks.  I 
suspect, however, that doing this in a way that doesn't mess with the 
current C API is non-trivial.  And for many "obvious" ways to simplify the 
various stacks, locals, etc., the downside could be more complexity for 
generators, and probably less speed as well.

For example, we could use a single "stack" arena in the heap for 
parameters, locals, cells, and blocks, rather than doing all the various 
sub-allocations within the frame.  But then creating a frame would involve 
copying data off the top of this pseudo-stack, and doing all the offset 
computations and perhaps some other trickery as well.  And resuming a 
generator would have to either copy it back, or have some sane way to make 
calls out to a new stack arena when calling other functions - thus making 
those operations slower.

The real problem, of course, with any of these ideas is that we are at best 
shaving a few percentage points here, a few points there, so it's 
comparatively speaking rather expensive to do the experiments to see if 
they help anything.

More information about the Python-Dev mailing list