[Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)
Phillip J. Eby
pje at telecommunity.com
Thu Oct 6 06:47:40 CEST 2005
At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
>(anyone still thinking about removing the block stack?).
I'm not any more. My thought was that it would be good for performance, by
reducing the memory allocation overhead for frames enough to allow pymalloc
to be used instead of the platform malloc. After more investigation,
however, I realized that was a dumb idea, because for a typical application
the amortized allocation cost of frames approaches zero as the program runs
and allocates as many frames as it will ever use, as large as it will ever
use them, and just recycles them on the free list. And all of the ways I
came up with for removing the block stack were a lot more complex than
leaving it as-is.
Clearly, the cost of function calls in Python lies somewhere else, and I'd
probably look next at parameter tuple allocation, and other frame
initialization activities. I seem to recall that Armin Rigo once supplied
a patch that sped up calls at the cost of slowing down recursive or
re-entrant ones, and I seem to recall that it was based on preinitializing
frames, not just preallocating them:
http://mail.python.org/pipermail/python-dev/2004-March/042871.html
However, the patch was never applied because of its increased memory usage
as well as the slowdown for recursion.
Every so often, in blue-sky thinking about alternative Python VM designs, I
think about making frames virtual, in the sense of not even having "real"
frame objects except for generators, sys._getframe(), and tracebacks. I
suspect, however, that doing this in a way that doesn't mess with the
current C API is non-trivial. And for many "obvious" ways to simplify the
various stacks, locals, etc., the downside could be more complexity for
generators, and probably less speed as well.
For example, we could use a single "stack" arena in the heap for
parameters, locals, cells, and blocks, rather than doing all the various
sub-allocations within the frame. But then creating a frame would involve
copying data off the top of this pseudo-stack, and doing all the offset
computations and perhaps some other trickery as well. And resuming a
generator would have to either copy it back, or have some sane way to make
calls out to a new stack arena when calling other functions - thus making
those operations slower.
The real problem, of course, with any of these ideas is that we are at best
shaving a few percentage points here, a few points there, so it's
comparatively speaking rather expensive to do the experiments to see if
they help anything.
More information about the Python-Dev
mailing list