[Python-Dev] PEP 393 review
Victor Stinner
victor.stinner at haypocalc.com
Mon Aug 29 10:52:52 CEST 2011
Le 28/08/2011 23:06, "Martin v. Löwis" a écrit :
> Am 28.08.2011 22:01, schrieb Antoine Pitrou:
>>
>>> - the iobench results are between 2% acceleration (seek operations),
>>> 16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
>>> 37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
>>> difference is probably in the UTF-8 decoder; I have already
>>> restored the "runs of ASCII" optimization and am out of ideas for
>>> further speedups. Again, having to scan the UTF-8 string twice
>>> is probably one cause of slowdown.
>>
>> I don't think it's the UTF-8 decoder because I see an even larger
>> slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").
>
> Those haven't been ported to the new API, yet. Consider, for example,
> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
> is a 25% speedup for PEP 393.
If I understand correctly, the performance now highly depend on the used
characters? A pure ASCII string is faster than a string with characters
in the ISO-8859-1 charset? Is it also true for BMP characters vs non-BMP
characters?
Do these benchmark tools use only ASCII characters, or also some
ISO-8859-1 characters? Or, better, different Unicode ranges in different
tests?
Victor
More information about the Python-Dev
mailing list