
At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
(anyone still thinking about removing the block stack?).
I'm not any more. My thought was that it would be good for performance, by reducing the memory allocation overhead for frames enough to allow pymalloc to be used instead of the platform malloc. After more investigation, however, I realized that was a dumb idea, because for a typical application the amortized allocation cost of frames approaches zero as the program runs and allocates as many frames as it will ever use, as large as it will ever use them, and just recycles them on the free list. And all of the ways I came up with for removing the block stack were a lot more complex than leaving it as-is. Clearly, the cost of function calls in Python lies somewhere else, and I'd probably look next at parameter tuple allocation, and other frame initialization activities. I seem to recall that Armin Rigo once supplied a patch that sped up calls at the cost of slowing down recursive or re-entrant ones, and I seem to recall that it was based on preinitializing frames, not just preallocating them: http://mail.python.org/pipermail/python-dev/2004-March/042871.html However, the patch was never applied because of its increased memory usage as well as the slowdown for recursion. Every so often, in blue-sky thinking about alternative Python VM designs, I think about making frames virtual, in the sense of not even having "real" frame objects except for generators, sys._getframe(), and tracebacks. I suspect, however, that doing this in a way that doesn't mess with the current C API is non-trivial. And for many "obvious" ways to simplify the various stacks, locals, etc., the downside could be more complexity for generators, and probably less speed as well. For example, we could use a single "stack" arena in the heap for parameters, locals, cells, and blocks, rather than doing all the various sub-allocations within the frame. But then creating a frame would involve copying data off the top of this pseudo-stack, and doing all the offset computations and perhaps some other trickery as well. And resuming a generator would have to either copy it back, or have some sane way to make calls out to a new stack arena when calling other functions - thus making those operations slower. The real problem, of course, with any of these ideas is that we are at best shaving a few percentage points here, a few points there, so it's comparatively speaking rather expensive to do the experiments to see if they help anything.