[New-bugs-announce] [issue14923] Even faster UTF-8 decoding

Serhiy Storchaka report at bugs.python.org
Sat May 26 11:11:07 CEST 2012


New submission from Serhiy Storchaka <storchaka at gmail.com>:

As strange as it may seem, but using a simple trick was made UTF-8 decoding even more speed up.

Here are the benchmark results.

On 32-bit Linux, AMD Athlon 64 X2:

                                          vanilla      patched

utf-8     'A'*10000                       2061 (+3%)   2115
utf-8     '\x80'*10000                    383 (-7%)    355
utf-8       '\x80'+'A'*9999               1273 (+1%)   1290
utf-8     '\u0100'*10000                  382 (+47%)   562
utf-8       '\u0100'+'A'*9999             1239 (+1%)   1253
utf-8       '\u0100'+'\x80'*9999          383 (+47%)   562
utf-8     '\u8000'*10000                  434 (-6%)    409
utf-8       '\u8000'+'A'*9999             1245 (+1%)   1256
utf-8       '\u8000'+'\x80'*9999          382 (+47%)   560
utf-8       '\u8000'+'\u0100'*9999        383 (+44%)   553
utf-8     '\U00010000'*10000              358 (+4%)    373
utf-8       '\U00010000'+'A'*9999         1171 (+0%)   1176
utf-8       '\U00010000'+'\x80'*9999      381 (+44%)   548
utf-8       '\U00010000'+'\u0100'*9999    381 (+44%)   548
utf-8       '\U00010000'+'\u8000'*9999    404 (+0%)    406

On 32-bit Linux, Intel Atom N570:

                                          vanilla      patched

utf-8     'A'*10000                       623 (+0%)    626
utf-8     '\x80'*10000                    145 (+15%)   167
utf-8       '\x80'+'A'*9999               354 (+2%)    362
utf-8     '\u0100'*10000                  164 (+10%)   181
utf-8       '\u0100'+'A'*9999             343 (-0%)    342
utf-8       '\u0100'+'\x80'*9999          164 (+11%)   182
utf-8     '\u8000'*10000                  175 (+5%)    183
utf-8       '\u8000'+'A'*9999             349 (+0%)    349
utf-8       '\u8000'+'\x80'*9999          164 (+11%)   182
utf-8       '\u8000'+'\u0100'*9999        164 (+10%)   181
utf-8     '\U00010000'*10000              152 (+11%)   168
utf-8       '\U00010000'+'A'*9999         313 (+0%)    313
utf-8       '\U00010000'+'\x80'*9999      161 (+11%)   179
utf-8       '\U00010000'+'\u0100'*9999    161 (+11%)   179
utf-8       '\U00010000'+'\u8000'*9999    160 (+11%)   177

----------
components: Interpreter Core, Unicode
files: decode_utf8_signed_byte.patch
keywords: patch
messages: 161652
nosy: Arfrever, ezio.melotti, haypo, janssen, jcea, loewis, mark.dickinson, ned.deily, pitrou, python-dev, ronaldoussoren, storchaka
priority: normal
severity: normal
status: open
title: Even faster UTF-8 decoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25717/decode_utf8_signed_byte.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue14923>
_______________________________________


More information about the New-bugs-announce mailing list