[Python-3000] Function call speed (Was: Cleaning up argument listparsing)
Talin
talin at acm.org
Wed Apr 19 03:09:58 CEST 2006
I wanted to follow up a bit on the issue of speeding up function calls with
some more specifics.
Currently, the function call mechanism has to do a lot of conversion
operations on the function arguments. Specifically, it takes the various
positional arguments, keyword arguments, varargs argument, and kwargs, and
combines them into a single tuple and a single dict. It then calls the target
function via PyObject_Call.
On the recieving side, a different conversion takes place, depending on the
type of the callable. For user-defined functions, it maps the various
arguments in the tuple and dict into the function's parameter slots,
along with considerations of default values and such.
One way to speed this up would be to determine cases where the conversion
steps can be skipped. There is already a method for this in the case where
there are only positional arguments, no varargs or keywords.
Ideally, we would like to be able to pass the caller's argument stack pretty
much as-is to the reciever. At the moment, that consists of an array of
positional arguments, followed by an array of key/value pairs for the keyword
arguments, followed by an optional vararg or kwarg.
Assume for a moment, then, that we had a variation of PyObject_Call which had
a function signature that was exactly that:
PyObject_CallFast(
func,
PyObject **args, int numArgs,
PyObject **kargs, int numKwArgs,
PyObject *starargs,
PyObject *kwarg );
The reciever would have to do extra work, because now it has to check two
places for each argument. For a keyword argument, for example, it has to check
both the key/value array and the kwargs dict argument.
However, this is far less work than building a new dictionary. In addition, we
know that the keywords are interned strings, not arbitrary objects, so we can
speed up the comparison operation by avoiding type tests.
Also, because the keyword args are in a linear list, they will exhibit better
cache behavior, and the lookup will be very fast - a simple loop and test. (If
you want to write optimal code for today's processors, you pretty much have to
throw out everything you learned about unrolling loops and such, and learn to
think in terms of cache lines, not bytes and words.)
Now, it may be that in some cases, the reciever really does require a tuple
and a dict. However, in such a case, the onus should be on the reciever to
convert the argument list into the preferred form.
So in other words, it would be the reciever's job to convert the arguments
into whatever data structure the reciever required. For recievers with
relatively simple calling signatures, it may mean no conversion is done at
all, and the caller's argument list can be used directly. The most complex
part would be validating the arguments - making sure that a keyword argument
didn't get assigned more than once, and so on.
What about older code? Well, I suppose you could add an additional field to
the recieving object, indicating whether it accepted the traditional
PyObject_Call-style args, or the newer format. Essentially you would have two
function pointers in the object, one for newstyle calling, and one for
oldstyle. If the newstyle was present, then the interpreter would use it,
otherwise it would fall back to the older means of transforming the arguments
into a tuple and a dict.
In any case, the reason why this is "Py3K" material is that it involves
busting the existing C API to some extent.
-- Talin
More information about the Python-3000
mailing list