[Python-Dev] Speed up function calls
Neal Norwitz
nnorwitz at gmail.com
Wed Jan 26 04:35:42 CET 2005
On Tue, 25 Jan 2005 06:42:57 -0500, Raymond Hettinger
<raymond.hettinger at verizon.net> wrote:
> >
> > I think tested a method I changed from METH_O to METH_ARGS and could
> > not measure a difference.
>
> Something is probably wrong with the measurements. The new call does much more work than METH_O or METH_NOARGS. Those two common and essential cases cannot be faster and are likely slower on at least some compilers and some machines. If some timing shows differently, then it is likely a mirage (falling into an unsustainable local minimum).
I tested w/chr() which Martin pointed out is broken in my patch. I
just tested with len('') and got these results (again on opteron):
# without patch
neal at janus clean $ ./python ./Lib/timeit.py -v "len('')"
10 loops -> 8.11e-06 secs
100 loops -> 6.7e-05 secs
1000 loops -> 0.000635 secs
10000 loops -> 0.00733 secs
100000 loops -> 0.0634 secs
1000000 loops -> 0.652 secs
raw times: 0.654 0.652 0.654
1000000 loops, best of 3: 0.652 usec per loop
# with patch
neal at janus src $ ./python ./Lib/timeit.py -v "len('')"
10 loops -> 9.06e-06 secs
100 loops -> 7.01e-05 secs
1000 loops -> 0.000692 secs
10000 loops -> 0.00693 secs
100000 loops -> 0.0708 secs
1000000 loops -> 0.703 secs
raw times: 0.712 0.714 0.713
1000000 loops, best of 3: 0.712 usec per loop
So with the patch METH_O is .06 usec slower.
I'd like to discuss this later after I explain a bit more about the
direction I'm headed. I agree that METH_O and METH_NOARGS are near
optimal wrt to performance. But if we could have one METH_UNPACKED
instead of 3 METH_*, I think that would be a win.
> > A beneift would be to consolidate METH_O,
> > METH_NOARGS, and METH_VARARGS into a single case. This should
> > make code simpler all around (IMO).
>
> Will backwards compatibility allow those cases to be eliminated? It would be a bummer if most existing extensions could not compile with Py2.5. Also, METH_VARARGS will likely have to hang around unless a way can be found to handle more than nine arguments.
Sorry, I meant eliminated w/3.0. METH_O couldn't be eliminated, but
METH_NOARGS actually could since min/max args would be initialized
to 0. so #define METH_NOARGS METH_UNPACKED would work.
But I'm not proposing that, unless there is consensus that it's ok.
> This patch appears to be taking on a life of its own and is being applied more broadly than is necessary or wise. The patch is extensive and introduces a new C API that cannot be taken back later, so we ought to be careful with it.
I agree we should be careful. But it's all experimentation right now.
The reason to modify METH_O and METH_NOARGS is verify direction
and various effects. It's not necessarily meant to be integrated.
> That being said, I really like the concept. I just worry that many of the stated benefits won't materialize:
> * having to keep the old versions for backwards compatibility,
> * being slower than METH_O and METH_NOARGS,
> * not handling more than nine arguments,
There are very few functions I've found that take more than 2 arguments.
Should 9 be lower, higher? I don't have a good feel. From what I've
seen, 5 may be more reasonable as far as catching 90% of the cases.
> * separating function signature info from the function itself,
I haven't really seen any discussion on this point. I think
Raymond pointed out this isn't really much different today
with METH_NOARGS and METH_KEYWORD. METH_O too
if you consider how the arg is used even though the signature
is still the same.
> * the time to initialize all the argument variables to NULL,
See below how this could be fixed.
> * somewhat unattractive case stmt code for building the c function call.
This is the python test coverage:
http://coverage.livinglogic.de/coverage/web/selectEntry.do?template=2850&entryToSelect=182530
Note that VARARGS is over 3 times as likely as METH_O or METH_NOARGS.
Plus we could get rid of a couple of if statements.
So far it seems there isn't any specific problems with the approach.
There are simply concerns. I not sure it would be best to modify this
patch over many iterations and then make one huge checkin. I also
don't want to lose the changes or the results. Perhaps I should make
a branch for this work? It's easy to abondon it or take only the
pieces we want if it should ever see the light of day.
----
Here's some thinking out loud. Raymond mentioned about some of the
warts of the current patch. In particular, all nine argument
variables are initialized each time and there's a switch on the number
of arguments.
Ultimately, I think we can speed things up more by having 9 different
op codes, ie, one for each # of arguments. CALL_FUNCTION_0,
CALL_FUNCTION_1, ...
(9 is still arbitrary and subject to change)
Then we would have N little functions, each with the exact # of
parameters. Each would still need a switch to call the C function
because there may be optional parameters. Ultimately, it's possible
the code would be small enough to stick it into the eval_frame loop.
Each of these steps would need to be tested, but that's a possible
longer term direction.
There would only be an if to check if it was a C function or not.
Maybe we could even get rid of this by more fixup at import time.
Neal
More information about the Python-Dev
mailing list