[Python-3000] Function call speed (Was: Cleaning up argument listparsing)

Talin talin at acm.org
Wed Apr 19 03:09:58 CEST 2006


I wanted to follow up a bit on the issue of speeding up function calls with 
some more specifics.

Currently, the function call mechanism has to do a lot of conversion 
operations on the function arguments. Specifically, it takes the various 
positional arguments, keyword arguments, varargs argument, and kwargs, and 
combines them into a single tuple and a single dict. It then calls the target 
function via PyObject_Call.

On the recieving side, a different conversion takes place, depending on the 
type of the callable. For user-defined functions, it maps the various 
arguments in the tuple and dict into the function's parameter slots,
along with considerations of default values and such.

One way to speed this up would be to determine cases where the conversion 
steps can be skipped. There is already a method for this in the case where 
there are only positional arguments, no varargs or keywords.

Ideally, we would like to be able to pass the caller's argument stack pretty 
much as-is to the reciever. At the moment, that consists of an array of 
positional arguments, followed by an array of key/value pairs for the keyword 
arguments, followed by an optional vararg or kwarg.

Assume for a moment, then, that we had a variation of PyObject_Call which had 
a function signature that was exactly that:

    PyObject_CallFast(
        func,
        PyObject **args, int numArgs,
        PyObject **kargs, int numKwArgs,
        PyObject *starargs,
        PyObject *kwarg );

The reciever would have to do extra work, because now it has to check two 
places for each argument. For a keyword argument, for example, it has to check 
both the key/value array and the kwargs dict argument.

However, this is far less work than building a new dictionary. In addition, we 
know that the keywords are interned strings, not arbitrary objects, so we can 
speed up the comparison operation by avoiding type tests.

Also, because the keyword args are in a linear list, they will exhibit better 
cache behavior, and the lookup will be very fast - a simple loop and test. (If 
you want to write optimal code for today's processors, you pretty much have to 
throw out everything you learned about unrolling loops and such, and learn to 
think in terms of cache lines, not bytes and words.)

Now, it may be that in some cases, the reciever really does require a tuple 
and a dict. However, in such a case, the onus should be on the reciever to 
convert the argument list into the preferred form.

So in other words, it would be the reciever's job to convert the arguments 
into whatever data structure the reciever required. For recievers with 
relatively simple calling signatures, it may mean no conversion is done at 
all, and the caller's argument list can be used directly. The most complex 
part would be validating the arguments - making sure that a keyword argument 
didn't get assigned more than once, and so on.

What about older code? Well, I suppose you could add an additional field to 
the recieving object, indicating whether it accepted the traditional 
PyObject_Call-style args, or the newer format. Essentially you would have two 
function pointers in the object, one for newstyle calling, and one for 
oldstyle. If the newstyle was present, then the interpreter would use it, 
otherwise it would fall back to the older means of transforming the arguments 
into a tuple and a dict.

In any case, the reason why this is "Py3K" material is that it involves 
busting the existing C API to some extent.

-- Talin




More information about the Python-3000 mailing list