[Python-Dev] Store x Load x --> DupStore

Michael Hudson mwh at python.net
Sun Feb 20 22:54:43 CET 2005


"Phillip J. Eby" <pje at telecommunity.com> writes:

> At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:
>>"Phillip J. Eby" <pje at telecommunity.com> writes:
>>
>> > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote:
>> >>Where are the attempts to speed up function/method calls? That's an
>> >>area where we could *really* use a breakthrough...
>> >
>> > Amen!
>> >
>> > So what happened to Armin's pre-allocated frame patch?  Did that
>> get into 2.4?
>>
>>No, because it slows down recursive function calls, or functions that
>>happen to be called at the same time in different threads.  Fixing
>>*that* would require things like code specific frame free-lists and
>>that's getting a bit convoluted and might waste quite a lot of memory.
>
> Ah.  I thought it was just going to fall back to the normal case if
> the pre-allocated frame wasn't available (i.e., didn't have a refcount
> of 1).

Well, I don't think that's the test, but that might work.  Someone
should try it :) (I'm trying something else currently).

>>Eliminating the blockstack would be nice (esp. if it's enough to get
>>frames small enough that they get allocated by PyMalloc) but this
>>seemed to be tricky too (or at least Armin, Samuele and I spent a
>>cuple of hours yakking about it on IRC and didn't come up with a clear
>>approach).  Dynamically allocating the blockstack would be simpler,
>>and might acheive a similar win.  (This is all from memory, I haven't
>>thought about specifics in a while).
>
> I'm not very familiar with the operation of the block stack, but why
> does it need to be a stack?  

Finally blocks are the problem, I think.

> For exception handling purposes, wouldn't it suffice to know the
> offset of the current handler, and have an opcode to set the current
> handler location?  And for "for" loops, couldn't an anonymous local
> be used to hold the loop iterator instead of using a stack variable?
> Hm, actually I think I see the answer; in the case of module-level
> code there can be no "anonymous local variables" the way there can in
> functions.  Hmm.

I don't think this is the killer blow.  I can't remember the details
and it's too late to think about them, so I'm going to wait and see if
Samuele replies :)

>>All of it, in easy cases.  ISTR that the fast path could be a little
>>wider -- it bails when the called function has default arguments, but
>>I think this case could be handled easily enough.
>
> When it has *any* default arguments, or only when it doesn't have
> values to supply for them?

When it has *any*, I think.  I also think this is easy to change.

>>Why are frames so big?
>
> Because there are CO_MAXBLOCKS * 12 bytes in there for the block
> stack.  If there was no need for that, frames could perhaps be
> allocated via pymalloc.  They only have around 100 bytes or so in
> them, apart from the blockstack and locals/value stack.

What I'm trying is allocating the blockstack separately and see if two
pymallocs are cheaper than one malloc.

>> > Do we need a tp_callmethod that takes an argument array, length, and
>> > keywords, so that we can skip instancemethod allocation in the
>> > common case of calling a method directly?
>>
>>Hmm, didn't think of that, and I don't think it's how the CALL_ATTR
>>attempt worked.  I presume it would need to take a method name too :)
>
> Er, yeah, I thought that was obvious.  :)

Someone should try this too :)

Cheers,
mwh

-- 
  It is never worth a first class man's time to express a majority
  opinion.  By definition, there are plenty of others to do that.
                                                        -- G. H. Hardy


More information about the Python-Dev mailing list