[Python-Dev] PEP 393 review
"Martin v. Löwis"
martin at v.loewis.de
Mon Aug 29 21:34:48 CEST 2011
>> Those haven't been ported to the new API, yet. Consider, for example,
>> d9821affc9ee. Before that, I got 253 MB/s on the 4096 units read test;
>> with that change, I get 610 MB/s. The trunk gives me 488 MB/s, so this
>> is a 25% speedup for PEP 393.
>
> If I understand correctly, the performance now highly depend on the used
> characters? A pure ASCII string is faster than a string with characters
> in the ISO-8859-1 charset?
How did you infer that from above paragraph??? ASCII and Latin-1 are
mostly identical in terms of performance - the ASCII decoder should be
slightly slower than the Latin-1 decoder, since the ASCII decoder needs
to check for errors, whereas the Latin-1 decoder will never be
confronted with errors.
What matters is
a) is the codec already rewritten to use the new representation, or
must it go through Py_UNICODE[] first, requiring then a second copy
to the canonical form?
b) what is the cost of finding out the highest character? - regardless
of what the highest character turns out to be
> Is it also true for BMP characters vs non-BMP
> characters?
Well... If you are talking about the ASCII and Latin-1 codecs - neither
of these support most BMP characters, let alone non-BMP characters.
In general, non-BMP characters are more expensive to process since they
take more space.
> Do these benchmark tools use only ASCII characters, or also some
> ISO-8859-1 characters?
See for yourself. iobench uses Latin-1, including non-ASCII, but not
non-Latin-1.
> Or, better, different Unicode ranges in different tests?
That's why I asked for a list of benchmarks to perform. I cannot
run an infinite number of benchmarks prior to adoption of the PEP.
Regards,
Martin
More information about the Python-Dev
mailing list