[Python-Dev] New calling convention to avoid temporarily tuples when calling functions

Mon Aug 8 20:01:34 EDT 2016

2016-08-09 1:36 GMT+02:00 Brett Cannon <brett at python.org>:
> I just wanted to say I'm excited about this and I'm glad someone is taking
> advantage of what Argument Clinic allows for and what I know Larry had
> initially hoped AC would make happen!

To make "Python" faster, not only a few specific functions, "all" C
code should be updated to use the new "FASTCALL" calling convention.
But it's a pain to have to rewrite the code parsing arguments, we all
hate having to put #ifdef in the code... (for backward compatibility.)

This is where the magic happens: if your code is written using
Argument Clinic, you will get the optimization (FASTCALL) for free:
just run again Argument Clinic to get the "updated" "calling
convention".

It can be a very good motivation to rewrite your code using Argument
Clinic: get better inline documentation (docstring, help(func) in
REPL) *and* performance ;-)

> I should also point out that Serhiy has a patch for faster keyword argument
> parsing thanks to AC: http://bugs.python.org/issue27574 . Not sure how your
> two patches would intertwine (if at all).

In a first implementation, I packed *all* arguments in the same C
array: positional and keyword arguments. The problem is that all
functions expect a dict to parse keyword arguments. A dict has an
important property: O(1) for lookup. It becomes O(n) if you pass
keyword arguments as a list of (key, value) tuples in a C array.

So I chose to don't touch keyword arguments at all: continue to pass
them as a dict.

By the way, it's very rare to call a function using keyword arguments from C.

--

About http://bugs.python.org/issue27574 : it's really nice to see work
done on this part!

I recall a discussion of the performance of operator versus function
call. In some cases, the overhead of "parsing" arguments is higher
than the cost of the same feature implemented as an operator! Hum, it
was probably this issue:
https://bugs.python.org/issue17170

Extract of the issue:
"""
Some crude C benchmarking on this computer:
- calling PyUnicode_Replace is 35 ns (per call)
- calling "hundred".replace is 125 ns
- calling PyArg_ParseTuple with the same signature as "hundred".replace is 80 ns
"""

Victor