[New-bugs-announce] [issue15027] Faster UTF-32 encoding

Serhiy Storchaka report at bugs.python.org
Thu Jun 7 15:57:31 CEST 2012

New submission from Serhiy Storchaka <storchaka at gmail.com>:

In pair to issue14625 here is a patch than speed up UTF-32 encoding in several times. In addition, it fixes an unsafe check of an integer overflow.

Here are the results of benchmarking. See benchmark tools in https://bitbucket.org/storchaka/cpython-stuff repository.

On 32-bit Linux, AMD Athlon 64 X2 4600+ @ 2.4GHz:

Py2.7        Py3.2        Py3.3        patched

541 (+1032%) 541 (+1032%) 844 (+626%)  6125   encode  utf-32le  'A'*10000
543 (+1056%) 541 (+1060%) 844 (+643%)  6275   encode  utf-32le  '\x80'*10000
544 (+1010%) 542 (+1014%) 843 (+616%)  6037   encode  utf-32le    '\x80'+'A'*9999
541 (+799%)  542 (+797%)  764 (+537%)  4864   encode  utf-32le  '\u0100'*10000
544 (+781%)  542 (+784%)  767 (+525%)  4793   encode  utf-32le    '\u0100'+'A'*9999
544 (+789%)  542 (+792%)  766 (+531%)  4834   encode  utf-32le    '\u0100'+'\x80'*9999
542 (+799%)  541 (+801%)  764 (+538%)  4874   encode  utf-32le  '\u8000'*10000
544 (+779%)  542 (+782%)  767 (+523%)  4780   encode  utf-32le    '\u8000'+'A'*9999
544 (+793%)  542 (+796%)  766 (+534%)  4859   encode  utf-32le    '\u8000'+'\x80'*9999
544 (+819%)  542 (+823%)  766 (+553%)  5001   encode  utf-32le    '\u8000'+'\u0100'*9999
430 (+867%)  427 (+874%)  860 (+383%)  4157   encode  utf-32le  '\U00010000'*10000
543 (+655%)  543 (+655%)  861 (+376%)  4101   encode  utf-32le    '\U00010000'+'A'*9999
543 (+658%)  543 (+658%)  861 (+378%)  4116   encode  utf-32le    '\U00010000'+'\x80'*9999
543 (+670%)  543 (+670%)  859 (+387%)  4180   encode  utf-32le    '\U00010000'+'\u0100'*9999
543 (+666%)  543 (+666%)  860 (+383%)  4158   encode  utf-32le    '\U00010000'+'\u8000'*9999

541 (+880%)  543 (+876%)  844 (+528%)  5300   encode  utf-32be  'A'*10000
541 (+872%)  542 (+870%)  844 (+523%)  5256   encode  utf-32be  '\x80'*10000
544 (+843%)  542 (+846%)  843 (+509%)  5130   encode  utf-32be    '\x80'+'A'*9999
541 (+363%)  542 (+362%)  764 (+228%)  2505   encode  utf-32be  '\u0100'*10000
544 (+366%)  542 (+368%)  766 (+231%)  2534   encode  utf-32be    '\u0100'+'A'*9999
544 (+363%)  542 (+365%)  766 (+229%)  2519   encode  utf-32be    '\u0100'+'\x80'*9999
542 (+363%)  541 (+364%)  764 (+228%)  2509   encode  utf-32be  '\u8000'*10000
544 (+366%)  542 (+368%)  766 (+231%)  2534   encode  utf-32be    '\u8000'+'A'*9999
544 (+363%)  542 (+364%)  766 (+229%)  2517   encode  utf-32be    '\u8000'+'\x80'*9999
544 (+372%)  542 (+374%)  766 (+235%)  2568   encode  utf-32be    '\u8000'+'\u0100'*9999
430 (+428%)  427 (+432%)  860 (+164%)  2270   encode  utf-32be  '\U00010000'*10000
543 (+317%)  541 (+318%)  861 (+163%)  2262   encode  utf-32be    '\U00010000'+'A'*9999
543 (+320%)  541 (+321%)  861 (+165%)  2279   encode  utf-32be    '\U00010000'+'\x80'*9999
543 (+322%)  541 (+323%)  859 (+167%)  2290   encode  utf-32be    '\U00010000'+'\u0100'*9999
543 (+322%)  541 (+324%)  860 (+167%)  2292   encode  utf-32be    '\U00010000'+'\u8000'*9999

components: Interpreter Core, Unicode
files: encode-utf32.patch
keywords: patch
messages: 162474
nosy: Arfrever, asvetlov, ezio.melotti, haypo, pitrou, storchaka
priority: normal
severity: normal
status: open
title: Faster UTF-32 encoding
type: performance
versions: Python 3.3
Added file: http://bugs.python.org/file25857/encode-utf32.patch

Python tracker <report at bugs.python.org>

More information about the New-bugs-announce mailing list