[Python-Dev] Store x Load x --> DupStore
Phillip J. Eby
pje at telecommunity.com
Sun Feb 20 21:56:26 CET 2005
At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:
>"Phillip J. Eby" <pje at telecommunity.com> writes:
>
> > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote:
> >>Where are the attempts to speed up function/method calls? That's an
> >>area where we could *really* use a breakthrough...
> >
> > Amen!
> >
> > So what happened to Armin's pre-allocated frame patch? Did that get
> into 2.4?
>
>No, because it slows down recursive function calls, or functions that
>happen to be called at the same time in different threads. Fixing
>*that* would require things like code specific frame free-lists and
>that's getting a bit convoluted and might waste quite a lot of memory.
Ah. I thought it was just going to fall back to the normal case if the
pre-allocated frame wasn't available (i.e., didn't have a refcount of 1).
>Eliminating the blockstack would be nice (esp. if it's enough to get
>frames small enough that they get allocated by PyMalloc) but this
>seemed to be tricky too (or at least Armin, Samuele and I spent a
>cuple of hours yakking about it on IRC and didn't come up with a clear
>approach). Dynamically allocating the blockstack would be simpler,
>and might acheive a similar win. (This is all from memory, I haven't
>thought about specifics in a while).
I'm not very familiar with the operation of the block stack, but why does
it need to be a stack? For exception handling purposes, wouldn't it
suffice to know the offset of the current handler, and have an opcode to
set the current handler location? And for "for" loops, couldn't an
anonymous local be used to hold the loop iterator instead of using a stack
variable?
Hm, actually I think I see the answer; in the case of module-level code
there can be no "anonymous local variables" the way there can in
functions. Hmm. I guess you'd need to also have a "reset stack to level
X" opcode, then, and both it and the set-handler opcode would have to be
placed at every destination of a jump that crosses block boundaries. It's
not clear how big a win that is, due to the added opcodes even on non-error
paths.
Hey, wait a minute... all the block stack data is static, isn't it? I
mean, the contents of the block stack at any point in a code string could
be determined statically, by examination of the bytecode, couldn't it? If
that's the case, then perhaps we could design a pre-computed data structure
similar to co_lnotab that would be used by the evaluator in place of the
blockstack.
Of course, I may be talking through my hat here, as I have very little
experience with how the blockstack works. However, if this idea makes
sense, then perhaps it could actually speed up non-error paths as well
(except perhaps for the 'return' statement), at the cost of a larger code
structure and compiler complexity. But, if it also means that frames can
be allocated faster (e.g. via pymalloc), it might be worth it, just like
getting rid of SET_LINENO turned out to be a net win.
>All of it, in easy cases. ISTR that the fast path could be a little
>wider -- it bails when the called function has default arguments, but
>I think this case could be handled easily enough.
When it has *any* default arguments, or only when it doesn't have values to
supply for them?
>Why are frames so big?
Because there are CO_MAXBLOCKS * 12 bytes in there for the block stack. If
there was no need for that, frames could perhaps be allocated via
pymalloc. They only have around 100 bytes or so in them, apart from the
blockstack and locals/value stack.
> > Do we need a tp_callmethod that takes an argument array, length, and
> > keywords, so that we can skip instancemethod allocation in the
> > common case of calling a method directly?
>
>Hmm, didn't think of that, and I don't think it's how the CALL_ATTR
>attempt worked. I presume it would need to take a method name too :)
Er, yeah, I thought that was obvious. :)
More information about the Python-Dev
mailing list