[Python-Dev] PEP 393 review

Mon Aug 29 10:52:52 CEST 2011

Le 28/08/2011 23:06, "Martin v. Löwis" a écrit :
> Am 28.08.2011 22:01, schrieb Antoine Pitrou:
>>
>>> - the iobench results are between 2% acceleration (seek operations),
>>>    16% slowdown for small-sized reads (4.31MB/s vs. 5.22 MB/s) and
>>>    37% for large sized reads (154 MB/s vs. 235 MB/s). The speed
>>>    difference is probably in the UTF-8 decoder; I have already
>>>    restored the "runs of ASCII" optimization and am out of ideas for
>>>    further speedups. Again, having to scan the UTF-8 string twice
>>>    is probably one cause of slowdown.
>>
>> I don't think it's the UTF-8 decoder because I see an even larger
>> slowdown with simpler encodings (e.g. "-E latin1" or "-E utf-16le").
>
> Those haven't been ported to the new API, yet. Consider, for example,
> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
> is a 25% speedup for PEP 393.

If I understand correctly, the performance now highly depend on the used 
characters? A pure ASCII string is faster than a string with characters 
in the ISO-8859-1 charset? Is it also true for BMP characters vs non-BMP 
characters?

Do these benchmark tools use only ASCII characters, or also some 
ISO-8859-1 characters? Or, better, different Unicode ranges in different 
tests?

Victor