[Python-Dev] New calling convention to avoid temporarily tuples when calling functions
Victor Stinner
victor.stinner at gmail.com
Wed Aug 24 18:56:38 EDT 2016
Oh, I found a nice pice of CPython history in Modules/_pickle.c.
Extract of Python 3.3:
-----------------
/* A temporary cleaner API for fast single argument function call.
XXX: Does caching the argument tuple provides any real performance benefits?
A quick benchmark, on a 2.0GHz Athlon64 3200+ running Linux 2.6.24 with
glibc 2.7, tells me that it takes roughly 20,000,000 PyTuple_New(1) calls
when the tuple is retrieved from the freelist (i.e, call PyTuple_New() then
immediately DECREF it) and 1,200,000 calls when allocating brand new tuples
(i.e, call PyTuple_New() and store the returned value in an array), to save
one second (wall clock time). Either ways, the loading time a pickle stream
large enough to generate this number of calls would be massively
overwhelmed by other factors, like I/O throughput, the GC traversal and
object allocation overhead. So, I really doubt these functions provide any
real benefits.
On the other hand, oprofile reports that pickle spends a lot of time in
these functions. But, that is probably more related to the function call
overhead, than the argument tuple allocation.
XXX: And, what is the reference behavior of these? Steal, borrow? At first
glance, it seems to steal the reference of 'arg' and borrow the reference
of 'func'. */
static PyObject *
_Pickler_FastCall(PicklerObject *self, PyObject *func, PyObject *arg)
-----------------
Extract of Python 3.4 (same function):
-----------------
/* Note: this function used to reuse the argument tuple. This used to give
a slight performance boost with older pickle implementations where many
unbuffered reads occurred (thus needing many function calls).
However, this optimization was removed because it was too complicated
to get right. It abused the C API for tuples to mutate them which led
to subtle reference counting and concurrency bugs. Furthermore, the
introduction of protocol 4 and the prefetching optimization via peek()
significantly reduced the number of function calls we do. Thus, the
benefits became marginal at best. */
-----------------
It recalls me the story of property_descr_get() optimizations :-)
I hope that the new generic "fastcall" functions will provide a safe
and reliable optimization for the pickle module, property_descr_get()
and others optimized functions.
Victor
More information about the Python-Dev
mailing list