On Jun 22, 2020, at 5:10 AM, Victor Stinner email@example.com wrote:
Introduce C API incompatible changes to hide implementation details.
How much of the existing C extension ecosystem do you expect to break as a result of these incompatible changes?
It will be way easier to add new features.
This isn't self-evident. What is currently difficult that would be easier?
It becomes possible to experiment with more advanced optimizations in CPython than just micro-optimizations, like tagged pointers.
Is there any proof-of-concept to suggest that it is in realm of possibility that such an experiment would produce a favorable outcome? Otherwise, it isn't a reasonable justification for an extensive and irrevocable series a sweeping changes that affect the entire ecosystem of existing extensions.
**STATUS**: Completed (in Python 3.9)
I'm not sure that many people are monitoring that huge number of changes that have gone in mostly unreviewed. Mark Shannon and Stephan Krah have both raised concerns. It seems like one person has been given blanket authorization to revise nearly every aspect of the internals and to undo the design choices made by all the developers who've previously worked on the project.
Converting macros to static inline functions should only impact very few C extensions which use macros in unusual ways.
These should be individually verified to make sure they actually get inlined by the compiler. In https://bugs.python.org/issue39542 about nine PRs were applied without review or discussion. One of those, https://github.com/python/cpython/pull/18364 , converted PyType_Check() to static inline function but I'm not sure that it actually does get inlined. That may be the reason named tuple attribute access slowed by about 25% between Python 3.8 and Python 3.9.¹ Presumably, that PR also affected every single type check in the entire C codebase and will affect third-party extensions as well.
FWIW, I do appreciate the devotion and amount of effort in this undertaking — that isn't a question. However, as a community this needs to be conscious decision. I'm unclear about whether any benefits will ever materialize. I am clear that packages will be broken, that performance will be impacted, and that this is one-way trip that can never be undone. Most of the work is being done by one person. Many of the PRs aren't reviewed. The rate and volume of PRs are so high that almost no one can keep track of what is happening. Mark and Stefan have pushed back but with no effect.
¹ Timings for attribute access
$ python3.8 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 2000000 loops, best of 5: 119 nsec per loop
$ python3.9 -m timeit -s 'from collections import namedtuple' -s 'Point=namedtuple("Point", "x y")' -s 'p=Point(10,20)' 'p.x; p.y; p.x; p.y; p.x; p.y' 2000000 loops, best of 5: 152 nsec per loop
Python 3.8 disassembly (clean and fast) -----------------------
_tuplegetter_descr_get: testq %rsi, %rsi je L299 subq $8, %rsp movq 8(%rsi), %rax movq 16(%rdi), %rdx testb $4, 171(%rax) je L300 cmpq 16(%rsi), %rdx jnb L301 movq 24(%rsi,%rdx,8), %rax addq $1, (%rax) L290: addq $8, %rsp ret
Python 3.9 disassembly (doesn't look in-lined) -----------------------
_tuplegetter_descr_get: testq %rsi, %rsi pushq %r12 <-- new cost pushq %rbp <-- new cost pushq %rbx <-- new cost movq %rdi, %rbx je L382 movq 16(%rdi), %r12 movq %rsi, %rbp movq 8(%rsi), %rdi call _PyType_GetFlags <-- new non-inlined function call testl $67108864, %eax je L383 cmpq 16(%rbp), %r12 jnb L384 movq 24(%rbp,%r12,8), %rax addq $1, (%rax) popq %rbx <-- new cost popq %rbp <-- new cost popq %r12v <-- new cost ret