[Python-Dev] Store x Load x --> DupStore

Phillip J. Eby pje at telecommunity.com
Sun Feb 20 21:56:26 CET 2005


At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:
>"Phillip J. Eby" <pje at telecommunity.com> writes:
>
> > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote:
> >>Where are the attempts to speed up function/method calls? That's an
> >>area where we could *really* use a breakthrough...
> >
> > Amen!
> >
> > So what happened to Armin's pre-allocated frame patch?  Did that get 
> into 2.4?
>
>No, because it slows down recursive function calls, or functions that
>happen to be called at the same time in different threads.  Fixing
>*that* would require things like code specific frame free-lists and
>that's getting a bit convoluted and might waste quite a lot of memory.

Ah.  I thought it was just going to fall back to the normal case if the 
pre-allocated frame wasn't available (i.e., didn't have a refcount of 1).


>Eliminating the blockstack would be nice (esp. if it's enough to get
>frames small enough that they get allocated by PyMalloc) but this
>seemed to be tricky too (or at least Armin, Samuele and I spent a
>cuple of hours yakking about it on IRC and didn't come up with a clear
>approach).  Dynamically allocating the blockstack would be simpler,
>and might acheive a similar win.  (This is all from memory, I haven't
>thought about specifics in a while).

I'm not very familiar with the operation of the block stack, but why does 
it need to be a stack?  For exception handling purposes, wouldn't it 
suffice to know the offset of the current handler, and have an opcode to 
set the current handler location?  And for "for" loops, couldn't an 
anonymous local be used to hold the loop iterator instead of using a stack 
variable?

Hm, actually I think I see the answer; in the case of module-level code 
there can be no "anonymous local variables" the way there can in 
functions.  Hmm.  I guess you'd need to also have a "reset stack to level 
X" opcode, then, and both it and the set-handler opcode would have to be 
placed at every destination of a jump that crosses block boundaries.  It's 
not clear how big a win that is, due to the added opcodes even on non-error 
paths.

Hey, wait a minute...  all the block stack data is static, isn't it?  I 
mean, the contents of the block stack at any point in a code string could 
be determined statically, by examination of the bytecode, couldn't it?  If 
that's the case, then perhaps we could design a pre-computed data structure 
similar to co_lnotab that would be used by the evaluator in place of the 
blockstack.

Of course, I may be talking through my hat here, as I have very little 
experience with how the blockstack works.  However, if this idea makes 
sense, then perhaps it could actually speed up non-error paths as well 
(except perhaps for the 'return' statement), at the cost of a larger code 
structure and compiler complexity.  But, if it also means that frames can 
be allocated faster (e.g. via pymalloc), it might be worth it, just like 
getting rid of SET_LINENO turned out to be a net win.


>All of it, in easy cases.  ISTR that the fast path could be a little
>wider -- it bails when the called function has default arguments, but
>I think this case could be handled easily enough.

When it has *any* default arguments, or only when it doesn't have values to 
supply for them?


>Why are frames so big?

Because there are CO_MAXBLOCKS * 12 bytes in there for the block stack.  If 
there was no need for that, frames could perhaps be allocated via 
pymalloc.  They only have around 100 bytes or so in them, apart from the 
blockstack and locals/value stack.


> > Do we need a tp_callmethod that takes an argument array, length, and
> > keywords, so that we can skip instancemethod allocation in the
> > common case of calling a method directly?
>
>Hmm, didn't think of that, and I don't think it's how the CALL_ATTR
>attempt worked.  I presume it would need to take a method name too :)

Er, yeah, I thought that was obvious.  :)



More information about the Python-Dev mailing list