[Python-Dev] Eliminating the block stack (was Re: Store x Load
x --> DupStore)
Phillip J. Eby
pje at telecommunity.com
Sun Feb 20 22:34:50 CET 2005
At 03:56 PM 2/20/05 -0500, Phillip J. Eby wrote:
>At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:
>>Eliminating the blockstack would be nice (esp. if it's enough to get
>>frames small enough that they get allocated by PyMalloc) but this
>>seemed to be tricky too (or at least Armin, Samuele and I spent a
>>cuple of hours yakking about it on IRC and didn't come up with a clear
>>approach). Dynamically allocating the blockstack would be simpler,
>>and might acheive a similar win. (This is all from memory, I haven't
>>thought about specifics in a while).
I think I have an idea how to do it in a (relatively) simple fashion; see
if you can find a hole in it:
* Change the PyTryBlock struct to include an additional member, 'int
b_prev', that refers to the previous block in a chain
* Change the compiler's emission of SETUP_* opcodes, so that instead of a
PyTryBlock being added to the blockstack at interpretation time, it's added
to the end of a 'co_blktree' block array at compile time, with its 'b_prev'
pointing to the current "top" of the block stack. Instead of the SETUP_*
argument being the handler offset, have it be the index of the just-added
blocktree entry.
* Replace f_blockstack and f_iblock with 'int f_iblktree', and change
PyFrame_BlockSetup() to set this equal to the SETUP_* argument, and
PyFrame_BlockPop() to use this as an index into the code's co_blktree to
retrieve the needed values. PyFrame_BlockPop() would then set f_iblktree
equal to the "popped" block's 'b_prev' member, thus "popping" the block
from this virtual stack.
(Note, by the way, that the blocktree could actually be created as a
post-processing step of the current compilation process, by a loop that
scans the bytecode and tracks the current stack and blockstack levels, and
then replaces the SETUP_* opcodes' arguments. This might be a simpler
option than trying to change the compiler to do it along the way.)
Can anybody see any flaws in this concept? As far as I can tell it just
generates all possible block stack states at compile time, but doesn't
change block semantics in the least, and it scarcely touches the eval
loop. It seems like it could drop the size of frames enough to let them
use pymalloc instead of the OS malloc, at the cost of a 16 bytes per block
increase in the size of code objects. (And of course the necessary changes
to 'marshal' and 'dis' as well as the compiler and eval loop.)
(More precisely, frames whose f_nlocals + f_stacksize is 40 or less, would
be 256 bytes or less, and therefore pymalloc-able. However, this should
cover all but the most complex functions.)
More information about the Python-Dev
mailing list