[issue14687] Optimize str%tuple for the PEP 393

STINNER Victor report at bugs.python.org
Thu May 3 01:42:40 CEST 2012


STINNER Victor <victor.stinner at gmail.com> added the comment:

pyunicode_format_writer.patch: a new completly different approach. It's an optimistic patch: start with a short ASCII buffer, and grows slowly the buffer, and convert to UCS2 and maybe to UCS4 if needed. The UTF-8 decoder is based on the same idea.

The patch adds a "unicode writer", the optimistic writer. It overallocates the buffer by 50% to limit the number of calls to PyUnicode_Resize(). It may be reused by other functions.

My dummy benchmark script:
------------
$ cat ~/bench.sh 
./python -m timeit \
    -s 'fmt="%s:"; arg="abc"' \
    'fmt % arg'
./python -m timeit \
    -s 'N=200; L=3; fmt="%s"*N; args=("a"*L,)*N' \
    'fmt % args'
./python -m timeit \
    -s 's="x=%s, y=%u, z=%x"; args=(123, 456, 789)' \
    's%args'
./python -m timeit \
    -s 's="The %(k1)s is %(k2)s the %(k3)s."; args={"k1":"x","k2":"y","k3":"z",}' \
    's%args'
------------

Results.

Python 3.2:

10000000 loops, best of 3: 0.0916 usec per loop
100000 loops, best of 3: 4.04 usec per loop
1000000 loops, best of 3: 0.492 usec per loop
1000000 loops, best of 3: 0.305 usec per loop

Python 3.3:

10000000 loops, best of 3: 0.169 usec per loop
100000 loops, best of 3: 8.02 usec per loop
1000000 loops, best of 3: 0.648 usec per loop
1000000 loops, best of 3: 0.658 usec per loop

Python 3.3 optimist (compared to 3.3):

10000000 loops, best of 3: 0.123 usec per loop (-27%)
100000 loops, best of 3: 5.73 usec per loop (-29%)
1000000 loops, best of 3: 0.466 usec per loop (-28%)
1000000 loops, best of 3: 0.454 usec per loop (-31%)

Overhead of the PEP 393 (Python 3.2 => 3.3) without -> with the patch:

 * 85% -> 35%
 * 99% -> 41%
 * 31% -> -5% (Python 3.3 is *faster* on this specific case! maybe thanks to f4837725c50f)
 * 115% -> 49%

--

"%(name)s" syntax is still *much* slower than Python 3.2, I don't understand why.

Parameters of the Unicode writer (overallocation factor and initial size) may be adjusted (later?) for better performances.

----------
Added file: http://bugs.python.org/file25437/pyunicode_format_writer.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14687>
_______________________________________


More information about the Python-bugs-list mailing list