[New-bugs-announce] [issue28004] Optimize bytes.join(sequence)
STINNER Victor
report at bugs.python.org
Wed Sep 7 14:15:06 EDT 2016
New submission from STINNER Victor:
The article https://atleastfornow.net/blog/not-all-bytes/ says that bytes.join(sequence) is slower on Python 3 compared to Python 2.
I compared Python 2 and Python 3 code: the main different seems to be that Python 3 uses the Py_buffer API to support more types than only bytes, and that it allocates a buffer (on the C stack or on the heap memory) of Py_buffer objects.
Attached patch makes bytes.join(sequence) up to 29% faster. The patch avoids the Py_buffer API and the allocation of the temporary array of Py_buffer if all items are bytes or bytearray.
I'm not 100% sure that it's worth it to optimize bytes.join().
On Python 2, bytes += bytes uses an hack in Python/ceval.c to optimize this instruction. On Python 3, the optimization is only applied to str += str (unicode), bytes += bytes is inefficient. To get best performances on Python 2 and Python 3, bytes.join(sequence) is better than bytes+=bytes.
Microbenchmark commands:
$ ./python -m perf timeit -s "sep=b''; seq=(b'hello', b'world')" 'sep.join(seq)'
$ ./python -m perf timeit -s "sep=b''; seq=(b'hello', b'world', b'. ') * 100" 'sep.join(seq)'
Python 3.6 => patched Python3.6:
* 2 items: 92.1 ns +- 1.8 ns => 90.3 ns +- 3.1 ns (-2%)
* 300 items: 3.11 us +- 0.07 us => 2.22 us +- 0.11 us (-29%)
--
I'm not sure that Python 3 is really slower than Python 2 :-/ Python 3.5 is 10 ns slower tha Python 2.7 for sequence of 2 items, but it's 6% faster for 300 items.
So the question is if it's worth it to optimize bytes.join().
Python 2:
* 2 items: 87.7 ns +- 3.7 ns
* 300 items: 3.25 us +- 0.11 us
Python 3.5:
* 2 items: 97.4 ns +- 9.0 ns
* 300 items: 3.06 us +- 0.20 us
----------
files: bytes_join.patch
keywords: patch
messages: 274855
nosy: haypo, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Optimize bytes.join(sequence)
type: performance
versions: Python 3.6
Added file: http://bugs.python.org/file44444/bytes_join.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue28004>
_______________________________________
More information about the New-bugs-announce
mailing list