On Mon, 19 Nov 2018 11:28:46 +0100
Victor Stinner
I would expect that the most common source of speed up of a C extension is the removal of the cost of bytecode evaluation (ceval.c loop).
Well, I don't. All previous experiments showed that simply compiling Python code to C code using the "generic" C API yielded a 30% improvement. Conversely, the C _pickle module can be 100x faster than the pure Python pickle module. It's doing it *not* by using the generic C API, but by special-casing access to concrete types. You don't get that level of performance simply by removing the cost of bytecode evaluation: # C version $ python3 -m timeit -s "import pickle; x = list(range(1000))" "pickle.dumps(x)" 100000 loops, best of 3: 19 usec per loop # Python version $ python3 -m timeit -s "import pickle; x = list(range(1000))" "pickle._dumps(x)" 100 loops, best of 3: 2.25 msec per loop So, the numbers are on my side. So is the abundant experience of experts such as the Cython developers.
Python internals rely on internals to implement further optimizations, than modifying an "immutable" tuple, bytes or str object, because you can do that at the C level. But I'm not sure that I would like 3rd party extensions to rely on such things.
I'm not even talking about *modifying* tuples or str objects, I'm talking about *accessing* their value without going through an abstract API that does slot lookups, indirect function calls and object unboxing. For example, people may need a fast way to access the UTF-8 representation of a unicode object. Without making indirect function calls, and ideally without making a copy of the data either. How do you do that using the generic C API? Regards Antoine.