[issue38015] inline function generates slightly inefficient machine code

Sergey Fedoseev report at bugs.python.org
Fri Sep 6 10:28:25 EDT 2019


Sergey Fedoseev <fedoseev.sergey at gmail.com> added the comment:

I added similar patch that replaces get_small_int() with macro version, since it also induces unnecessary casts and makes machine code less efficient.

Example assembly can be checked at https://godbolt.org/z/1SjG3E.

This change produces tiny, but measurable speed-up for handling small ints:

$ python -m pyperf timeit -s "from collections import deque; consume = deque(maxlen=0).extend; r = range(256)" "consume(r)" --compare-to=../cpython-master/venv/bin/python --duplicate=1000
/home/sergey/tmp/cpython-master/venv/bin/python: ..................... 1.03 us +- 0.08 us
/home/sergey/tmp/cpython-dev/venv/bin/python: ..................... 973 ns +- 18 ns

Mean +- std dev: [/home/sergey/tmp/cpython-master/venv/bin/python] 1.03 us +- 0.08 us -> [/home/sergey/tmp/cpython-dev/venv/bin/python] 973 ns +- 18 ns: 1.05x faster (-5%)

----------

_______________________________________
Python tracker <report at bugs.python.org>
<https://bugs.python.org/issue38015>
_______________________________________


More information about the Python-bugs-list mailing list