[New-bugs-announce] [issue14923] Even faster UTF-8 decoding
Serhiy Storchaka
report at bugs.python.org
Sat May 26 11:11:07 CEST 2012
New submission from Serhiy Storchaka <storchaka at gmail.com>:
As strange as it may seem, but using a simple trick was made UTF-8 decoding even more speed up.
Here are the benchmark results.
On 32-bit Linux, AMD Athlon 64 X2:
vanilla patched
utf-8 'A'*10000 2061 (+3%) 2115
utf-8 '\x80'*10000 383 (-7%) 355
utf-8 '\x80'+'A'*9999 1273 (+1%) 1290
utf-8 '\u0100'*10000 382 (+47%) 562
utf-8 '\u0100'+'A'*9999 1239 (+1%) 1253
utf-8 '\u0100'+'\x80'*9999 383 (+47%) 562
utf-8 '\u8000'*10000 434 (-6%) 409
utf-8 '\u8000'+'A'*9999 1245 (+1%) 1256
utf-8 '\u8000'+'\x80'*9999 382 (+47%) 560
utf-8 '\u8000'+'\u0100'*9999 383 (+44%) 553
utf-8 '\U00010000'*10000 358 (+4%) 373
utf-8 '\U00010000'+'A'*9999 1171 (+0%) 1176
utf-8 '\U00010000'+'\x80'*9999 381 (+44%) 548
utf-8 '\U00010000'+'\u0100'*9999 381 (+44%) 548
utf-8 '\U00010000'+'\u8000'*9999 404 (+0%) 406
On 32-bit Linux, Intel Atom N570:
vanilla patched
utf-8 'A'*10000 623 (+0%) 626
utf-8 '\x80'*10000 145 (+15%) 167
utf-8 '\x80'+'A'*9999 354 (+2%) 362
utf-8 '\u0100'*10000 164 (+10%) 181
utf-8 '\u0100'+'A'*9999 343 (-0%) 342
utf-8 '\u0100'+'\x80'*9999 164 (+11%) 182
utf-8 '\u8000'*10000 175 (+5%) 183
utf-8 '\u8000'+'A'*9999 349 (+0%) 349
utf-8 '\u8000'+'\x80'*9999 164 (+11%) 182
utf-8 '\u8000'+'\u0100'*9999 164 (+10%) 181
utf-8 '\U00010000'*10000 152 (+11%) 168
utf-8 '\U00010000'+'A'*9999 313 (+0%) 313
utf-8 '\U00010000'+'\x80'*9999 161 (+11%) 179
utf-8 '\U00010000'+'\u0100'*9999 161 (+11%) 179
utf-8 '\U00010000'+'\u8000'*9999 160 (+11%) 177
----------
components: Interpreter Core, Unicode
files: decode_utf8_signed_byte.patch
keywords: patch
messages: 161652
nosy: Arfrever, ezio.melotti, haypo, janssen, jcea, loewis, mark.dickinson, ned.deily, pitrou, python-dev, ronaldoussoren, storchaka
priority: normal
severity: normal
status: open
title: Even faster UTF-8 decoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25717/decode_utf8_signed_byte.patch
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14923>
_______________________________________
More information about the New-bugs-announce
mailing list