[Python-Dev] Store x Load x --> DupStore
Michael Hudson
mwh at python.net
Sun Feb 20 22:54:43 CET 2005
"Phillip J. Eby" <pje at telecommunity.com> writes:
> At 07:00 PM 2/20/05 +0000, Michael Hudson wrote:
>>"Phillip J. Eby" <pje at telecommunity.com> writes:
>>
>> > At 08:17 AM 2/20/05 -0800, Guido van Rossum wrote:
>> >>Where are the attempts to speed up function/method calls? That's an
>> >>area where we could *really* use a breakthrough...
>> >
>> > Amen!
>> >
>> > So what happened to Armin's pre-allocated frame patch? Did that
>> get into 2.4?
>>
>>No, because it slows down recursive function calls, or functions that
>>happen to be called at the same time in different threads. Fixing
>>*that* would require things like code specific frame free-lists and
>>that's getting a bit convoluted and might waste quite a lot of memory.
>
> Ah. I thought it was just going to fall back to the normal case if
> the pre-allocated frame wasn't available (i.e., didn't have a refcount
> of 1).
Well, I don't think that's the test, but that might work. Someone
should try it :) (I'm trying something else currently).
>>Eliminating the blockstack would be nice (esp. if it's enough to get
>>frames small enough that they get allocated by PyMalloc) but this
>>seemed to be tricky too (or at least Armin, Samuele and I spent a
>>cuple of hours yakking about it on IRC and didn't come up with a clear
>>approach). Dynamically allocating the blockstack would be simpler,
>>and might acheive a similar win. (This is all from memory, I haven't
>>thought about specifics in a while).
>
> I'm not very familiar with the operation of the block stack, but why
> does it need to be a stack?
Finally blocks are the problem, I think.
> For exception handling purposes, wouldn't it suffice to know the
> offset of the current handler, and have an opcode to set the current
> handler location? And for "for" loops, couldn't an anonymous local
> be used to hold the loop iterator instead of using a stack variable?
> Hm, actually I think I see the answer; in the case of module-level
> code there can be no "anonymous local variables" the way there can in
> functions. Hmm.
I don't think this is the killer blow. I can't remember the details
and it's too late to think about them, so I'm going to wait and see if
Samuele replies :)
>>All of it, in easy cases. ISTR that the fast path could be a little
>>wider -- it bails when the called function has default arguments, but
>>I think this case could be handled easily enough.
>
> When it has *any* default arguments, or only when it doesn't have
> values to supply for them?
When it has *any*, I think. I also think this is easy to change.
>>Why are frames so big?
>
> Because there are CO_MAXBLOCKS * 12 bytes in there for the block
> stack. If there was no need for that, frames could perhaps be
> allocated via pymalloc. They only have around 100 bytes or so in
> them, apart from the blockstack and locals/value stack.
What I'm trying is allocating the blockstack separately and see if two
pymallocs are cheaper than one malloc.
>> > Do we need a tp_callmethod that takes an argument array, length, and
>> > keywords, so that we can skip instancemethod allocation in the
>> > common case of calling a method directly?
>>
>>Hmm, didn't think of that, and I don't think it's how the CALL_ATTR
>>attempt worked. I presume it would need to take a method name too :)
>
> Er, yeah, I thought that was obvious. :)
Someone should try this too :)
Cheers,
mwh
--
It is never worth a first class man's time to express a majority
opinion. By definition, there are plenty of others to do that.
-- G. H. Hardy
More information about the Python-Dev
mailing list