[New-bugs-announce] [issue16334] Faster unicode-escape and raw-unicode-escape codecs

Serhiy Storchaka report at bugs.python.org
Sat Oct 27 00:48:25 CEST 2012


New submission from Serhiy Storchaka:

The proposed patch optimizes unicode-escape and raw-unicode-escape codecs.  Coders still slower than in 3.2, but much faster than in 3.3.  Further speedup is possible with the use of stringlib, but I think that this is enough.  The code unified and simplified (251 insertions, 345 deletions).

Benchmark results (on AMD Athlon 64 X2 4600+):

Py2.7        Py3.2        Py3.3        Py3.4+patch

193 (+11%)   325 (-34%)   66 (+224%)   214    decode  unicode-escape  'A'*10000
138 (+72%)   241 (-1%)    154 (+55%)   238    decode  unicode-escape  '\x80'*10000
193 (+10%)   323 (-34%)   72 (+194%)   212    decode  unicode-escape    '\x80'+'A'*9999
160 (+59%)   273 (-7%)    169 (+51%)   255    decode  unicode-escape  '\u0100'*10000
193 (-7%)    324 (-44%)   61 (+195%)   180    decode  unicode-escape    '\u0100'+'A'*9999
138 (+67%)   242 (-5%)    135 (+71%)   231    decode  unicode-escape    '\u0100'+'\x80'*9999
160 (+59%)   274 (-7%)    169 (+51%)   255    decode  unicode-escape  '\u8000'*10000
193 (-7%)    323 (-44%)   61 (+195%)   180    decode  unicode-escape    '\u8000'+'A'*9999
138 (+67%)   242 (-5%)    135 (+71%)   231    decode  unicode-escape    '\u8000'+'\x80'*9999
160 (+60%)   276 (-7%)    169 (+51%)   256    decode  unicode-escape    '\u8000'+'\u0100'*9999
178 (+42%)   275 (-8%)    177 (+43%)   253    decode  unicode-escape  '\U00010000'*10000
192 (+30%)   323 (-23%)   61 (+310%)   250    decode  unicode-escape    '\U00010000'+'A'*9999
139 (+35%)   243 (-23%)   119 (+57%)   187    decode  unicode-escape    '\U00010000'+'\x80'*9999
161 (+38%)   273 (-19%)   150 (+48%)   222    decode  unicode-escape    '\U00010000'+'\u0100'*9999
161 (+38%)   273 (-19%)   150 (+48%)   222    decode  unicode-escape    '\U00010000'+'\u8000'*9999

558 (-62%)   427 (-50%)   82 (+161%)   214    decode  raw-unicode-escape  'A'*10000
560 (-62%)   425 (-50%)   75 (+183%)   212    decode  raw-unicode-escape  '\x80'*10000
558 (-62%)   425 (-50%)   75 (+183%)   212    decode  raw-unicode-escape    '\x80'+'A'*9999
178 (+75%)   235 (+32%)   108 (+188%)  311    decode  raw-unicode-escape  '\u0100'*10000
555 (-62%)   424 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u0100'+'A'*9999
559 (-62%)   424 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u0100'+'\x80'*9999
179 (+74%)   235 (+32%)   108 (+188%)  311    decode  raw-unicode-escape  '\u8000'*10000
555 (-62%)   424 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u8000'+'A'*9999
558 (-62%)   425 (-50%)   61 (+248%)   212    decode  raw-unicode-escape    '\u8000'+'\x80'*9999
178 (+75%)   235 (+32%)   108 (+188%)  311    decode  raw-unicode-escape    '\u8000'+'\u0100'*9999
200 (+18%)   249 (-5%)    132 (+79%)   236    decode  raw-unicode-escape  '\U00010000'*10000
554 (-58%)   423 (-46%)   61 (+277%)   230    decode  raw-unicode-escape    '\U00010000'+'A'*9999
558 (-59%)   424 (-46%)   61 (+277%)   230    decode  raw-unicode-escape    '\U00010000'+'\x80'*9999
178 (+46%)   235 (+11%)   100 (+160%)  260    decode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
178 (+44%)   235 (+9%)    100 (+157%)  257    decode  raw-unicode-escape    '\U00010000'+'\u8000'*9999


182 (+137%)  215 (+101%)  148 (+192%)  432    encode  unicode-escape  'A'*10000
582 (-10%)   617 (-16%)   470 (+11%)   521    encode  unicode-escape  '\x80'*10000
182 (+131%)  215 (+96%)   148 (+184%)  421    encode  unicode-escape    '\x80'+'A'*9999
624 (-7%)    967 (-40%)   558 (+4%)    579    encode  unicode-escape  '\u0100'*10000
183 (-19%)   215 (-31%)   132 (+12%)   148    encode  unicode-escape    '\u0100'+'A'*9999
584 (-23%)   617 (-27%)   464 (-3%)    451    encode  unicode-escape    '\u0100'+'\x80'*9999
627 (-8%)    968 (-40%)   557 (+4%)    579    encode  unicode-escape  '\u8000'*10000
183 (-19%)   215 (-31%)   148 (+0%)    148    encode  unicode-escape    '\u8000'+'A'*9999
584 (-23%)   617 (-27%)   490 (-8%)    451    encode  unicode-escape    '\u8000'+'\x80'*9999
629 (-8%)    969 (-40%)   555 (+4%)    578    encode  unicode-escape    '\u8000'+'\u0100'*9999
931 (-39%)   939 (-39%)   602 (-5%)    572    encode  unicode-escape  '\U00010000'*10000
183 (+7%)    215 (-9%)    180 (+9%)    196    encode  unicode-escape    '\U00010000'+'A'*9999
584 (-9%)    617 (-13%)   482 (+11%)   534    encode  unicode-escape    '\U00010000'+'\x80'*9999
630 (-14%)   962 (-43%)   565 (-4%)    544    encode  unicode-escape    '\U00010000'+'\u0100'*9999
630 (-14%)   964 (-44%)   565 (-4%)    544    encode  unicode-escape    '\U00010000'+'\u8000'*9999

332 (+1459%) 330 (+1468%) 333 (+1454%) 5175   encode  raw-unicode-escape  'A'*10000
332 (+1589%) 329 (+1604%) 333 (+1584%) 5607   encode  raw-unicode-escape  '\x80'*10000
336 (+1569%) 334 (+1579%) 333 (+1584%) 5607   encode  raw-unicode-escape    '\x80'+'A'*9999
904 (-38%)   911 (-39%)   557 (+0%)    558    encode  raw-unicode-escape  '\u0100'*10000
336 (+15%)   335 (+16%)   197 (+97%)   388    encode  raw-unicode-escape    '\u0100'+'A'*9999
335 (+16%)   335 (+16%)   197 (+97%)   388    encode  raw-unicode-escape    '\u0100'+'\x80'*9999
904 (-38%)   913 (-39%)   557 (+0%)    558    encode  raw-unicode-escape  '\u8000'*10000
335 (+16%)   335 (+16%)   197 (+96%)   387    encode  raw-unicode-escape    '\u8000'+'A'*9999
335 (+16%)   335 (+16%)   196 (+98%)   388    encode  raw-unicode-escape    '\u8000'+'\x80'*9999
912 (-39%)   909 (-39%)   554 (+1%)    558    encode  raw-unicode-escape    '\u8000'+'\u0100'*9999
966 (-40%)   997 (-42%)   584 (-0%)    583    encode  raw-unicode-escape  '\U00010000'*10000
336 (-42%)   335 (-41%)   213 (-8%)    196    encode  raw-unicode-escape    '\U00010000'+'A'*9999
336 (-42%)   335 (-41%)   213 (-8%)    196    encode  raw-unicode-escape    '\U00010000'+'\x80'*9999
911 (-43%)   911 (-43%)   570 (-8%)    522    encode  raw-unicode-escape    '\U00010000'+'\u0100'*9999
911 (-43%)   913 (-43%)   570 (-8%)    522    encode  raw-unicode-escape    '\U00010000'+'\u8000'*9999

----------
components: Interpreter Core, Unicode
files: faster_unicode_escape.patch
keywords: 3.3regression, patch
messages: 173901
nosy: benjamin.peterson, ezio.melotti, haypo, lemburg, pitrou, serhiy.storchaka
priority: normal
severity: normal
stage: patch review
status: open
title: Faster unicode-escape and raw-unicode-escape codecs
type: performance
versions: Python 3.4
Added file: http://bugs.python.org/file27740/faster_unicode_escape.patch

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue16334>
_______________________________________


More information about the New-bugs-announce mailing list