New GitHub issue #123500 from neonene:<br>
<hr>
<pre>
# Bug report
### Bug description:
There are callables implemented with the `METH_METHOD|METH_FASTCALL` signature in C. They can be 5%-15% less efficient than using only `METH_FASTCALL` (or `METH_O`) with a `PyType_GetModuleByDef` function call.
For example, I measured the difference on Windows PGO builds by duplicating functions:
* `CDataType_from_buffer_copy()` in `_ctypes.c`, which is not called when profiling:
```py
from timeit import timeit
setup = """if 1:
import ctypes
buf = bytearray(16)
cls = ctypes.c_char * len(buf)
"""
# with a warmup
for _ in range(2):
# METH_METHOD|METH_FASTCALL (as-is)
r0 = timeit(s0 := f'cls.from_buffer_copy (buf)', setup)
# METH_FASTCALL (no `defining_class`) + PyType_GetModuleByDef
r1 = timeit(s1 := f'cls.from_buffer_copy1(buf)', setup)
print(s0, r0, 1 + (1 - r0 / r0))
print(s1, r1, 1 + (1 - r1 / r0))
```
```py
cls.from_buffer_copy (buf) 0.15552800190635024 1.0
cls.from_buffer_copy1(buf) 0.13187471489945893 1.1520837837364741
```
* `dec_mpd_qquantize()` in `_decimal.c` profiled with 6800 calls (unfair?):
```py
# legacy (as-is)
d1.quantize (d2) 0.1694609627971658 1.0
# METH_METHOD|METH_FASTCALL (`defining_class`) + _PyType_GetModuleState
d1.quantize1(d2) 0.1408861404022900 1.168621857938327
# METH_FASTCALL (no `defining_class`) + PyType_GetModuleByDef
d1.quantize2(d2) 0.1258157708973158 1.257553074049807
```
<details><summary>Script (expand)</summary>
```py
from timeit import timeit
setup = """if 1:
from _decimal import Decimal
d1,d2 = Decimal(1.414), Decimal('0.01')
"""
for _ in range(2):
r0 = timeit(s0 := f'd1.quantize (d2)', setup)
r1 = timeit(s1 := f'd1.quantize1(d2)', setup)
r2 = timeit(s2 := f'd1.quantize2(d2)', setup)
print(s0, r0, 1 + (1 - r0 / r0))
print(s1, r1, 1 + (1 - r1 / r0))
print(s2, r2, 1 + (1 - r2 / r0))
```
</details>
Observations:
* The number of arguments had little to do with this.
* The gaps seem to be consistent as long as they are equally (un)exercised.
* The same goes for non-PGO builds and builtin modules (e.g. `_sre`), where the impacts may be less significant.
### CPython versions tested on:
CPython main branch
### Operating systems tested on:
Windows
</pre>
<hr>
<a href="https://github.com/python/cpython/issues/123500">View on GitHub</a>
<p>Labels: type-bug</p>
<p>Assignee: </p>