[Python-Dev] Speed up function calls

Neal Norwitz nnorwitz at gmail.com
Wed Jan 26 04:35:42 CET 2005


On Tue, 25 Jan 2005 06:42:57 -0500, Raymond Hettinger
<raymond.hettinger at verizon.net> wrote:
> >
> > I think tested a method I changed from METH_O to METH_ARGS and could
> > not measure a difference.
> 
> Something is probably wrong with the measurements.  The new call does much more work than METH_O or METH_NOARGS.  Those two common and essential cases cannot be faster and are likely slower on at least some compilers and some machines.  If some timing shows differently, then it is likely a mirage (falling into an unsustainable local minimum).

I tested w/chr() which Martin pointed out is broken in my patch.  I
just tested with len('') and got these results (again on opteron):

# without patch
neal at janus clean $ ./python ./Lib/timeit.py -v "len('')"
10 loops -> 8.11e-06 secs
100 loops -> 6.7e-05 secs
1000 loops -> 0.000635 secs
10000 loops -> 0.00733 secs
100000 loops -> 0.0634 secs
1000000 loops -> 0.652 secs
raw times: 0.654 0.652 0.654
1000000 loops, best of 3: 0.652 usec per loop
# with patch
neal at janus src $ ./python ./Lib/timeit.py -v "len('')"
10 loops -> 9.06e-06 secs
100 loops -> 7.01e-05 secs
1000 loops -> 0.000692 secs
10000 loops -> 0.00693 secs
100000 loops -> 0.0708 secs
1000000 loops -> 0.703 secs
raw times: 0.712 0.714 0.713
1000000 loops, best of 3: 0.712 usec per loop

So with the patch METH_O is .06 usec slower.

I'd like to discuss this later after I explain a bit more about the
direction I'm headed.  I agree that METH_O and METH_NOARGS are near
optimal wrt to performance.  But if we could have one METH_UNPACKED
instead of 3 METH_*, I think that would be a win.

> > A beneift would be to consolidate METH_O,
> > METH_NOARGS, and METH_VARARGS into a single case.  This should
> > make code simpler all around (IMO).
> 
> Will backwards compatibility allow those cases to be eliminated?  It would be a bummer if most existing extensions could not compile with Py2.5.  Also, METH_VARARGS will likely have to hang around unless a way can be found to handle more than nine arguments.

Sorry, I meant eliminated w/3.0.  METH_O couldn't be eliminated, but
METH_NOARGS actually could since min/max args would be initialized
to 0.  so #define METH_NOARGS METH_UNPACKED would work.  
But I'm not proposing that, unless there is consensus that it's ok.

> This patch appears to be taking on a life of its own and is being applied more broadly than is necessary or wise.  The patch is extensive and introduces a new C API that cannot be taken back later, so we ought to be careful with it.

I agree we should be careful.  But it's all experimentation right now.
The reason to modify METH_O and METH_NOARGS is verify direction
and various effects.  It's not necessarily meant to be integrated.

> That being said, I really like the concept.  I just worry that many of the stated benefits won't materialize:
> * having to keep the old versions for backwards compatibility,
> * being slower than METH_O and METH_NOARGS,
> * not handling more than nine arguments,

There are very few functions I've found that take more than 2 arguments.
Should 9 be lower, higher?  I don't have a good feel.  From what I've
seen, 5 may be more reasonable as far as catching 90% of the cases.

> * separating function signature info from the function itself,

I haven't really seen any discussion on this point.  I think
Raymond pointed out this isn't really much different today
with METH_NOARGS and METH_KEYWORD.  METH_O too
if you consider how the arg is used even though the signature
is still the same.

> * the time to initialize all the argument variables to NULL,

See below how this could be fixed.

> * somewhat unattractive case stmt code for building the c function call.

This is the python test coverage:
    http://coverage.livinglogic.de/coverage/web/selectEntry.do?template=2850&entryToSelect=182530

Note that VARARGS is over 3 times as likely as METH_O or METH_NOARGS. 
Plus we could get rid of a couple of if statements.

So far it seems there isn't any specific problems with the approach. 
There are simply concerns.  I not sure it would be best to modify this
patch over many iterations and then make one huge checkin.  I also
don't want to lose the changes or the results.  Perhaps I should make
a branch for this work?  It's easy to abondon it or take only the
pieces we want if it should ever see the light of day.

----

Here's some thinking out loud.  Raymond mentioned about some of the
warts of the current patch.  In particular, all nine argument
variables are initialized each time and there's a switch on the number
of arguments.

Ultimately, I think we can speed things up more by having 9 different
op codes, ie, one for each # of arguments.  CALL_FUNCTION_0,
CALL_FUNCTION_1, ...
(9 is still arbitrary and subject to change)

Then we would have N little functions, each with the exact # of
parameters.  Each would still need a switch to call the C function
because there may be optional parameters.  Ultimately, it's possible
the code would be small enough to stick it into the eval_frame loop. 
Each of these steps would need to be tested, but that's a possible
longer term direction.

There would only be an if to check if it was a C function or not. 
Maybe we could even get rid of this by more fixup at import time.

Neal


More information about the Python-Dev mailing list