[Python-Dev] Removing the block stack (was Re: PEP 343 and __with__)
Neal Norwitz
nnorwitz at gmail.com
Thu Oct 6 07:09:21 CEST 2005
On 10/5/05, Phillip J. Eby <pje at telecommunity.com> wrote:
> At 09:50 AM 10/4/2005 +0100, Michael Hudson wrote:
> >(anyone still thinking about removing the block stack?).
>
> I'm not any more. My thought was that it would be good for performance, by
> reducing the memory allocation overhead for frames enough to allow pymalloc
> to be used instead of the platform malloc.
I did something similar to reduce the frame size to under 256 bytes
(don't recall if I made a patch or not) and it had no overall effect
on perf.
> Clearly, the cost of function calls in Python lies somewhere else, and I'd
> probably look next at parameter tuple allocation, and other frame
> initialization activities.
I think that's a big part of it. This patch shows C calls getting
sped up primarly by avoiding tuple creation:
http://python.org/sf/1107887
I hope to work on that and get it into 2.5.
I've also been thinking about avoiding tuple creation when calling
python functions. The change I have in mind would probably have to
wait until p3k, but could yield some speed ups.
Warning: half baked idea follows.
My thoughts are to dynamically allocate the Python stack memory (e.g.,
void *stack = malloc(128MB)). Then all calls within each thread uses
its own stack. So things would be pushed onto the stack like they are
currently, but we wouldn't need to do create a tuple to pass to a
method, they could just be used directly. Basically more closely
simulate the way it currently works in hardware.
This would mean all the PyArg_ParseTuple()s would have to change. It
may be possible to fake it out, but I'm not sure it's worth it which
is why it would be easier to do this for p3k.
The general idea is to allocate the stack in one big hunk and just
walk up/down it as functions are called/returned. This only means
incrementing or decrementing pointers. This should allow us to avoid
a bunch of copying and tuple creation/destruction. Frames would
hopefully be the same size which would help. Note that even though
there is a free list for frames, there could still be
PyObject_GC_Resize()s often (or unused memory). WIth my idea,
hopefully there would be better memory locality, which could speed
things up.
n
More information about the Python-Dev
mailing list