Re: [pypy-dev] Unicode encode/decode speed

Feb. 12, 2013


      2013/2/12 Elefterios Stamatogiannakis <estama@gmail.com>
...
On 11/2/2013 7:39 μμ, Amaury Forgeot d'Arc wrote:
...
2013/2/11 Eleytherios Stamatogiannakis <estama@gmail.com
<mailto:estama@gmail.com>>
>
     > Which kind of profiler are you using? It possible that CPython
    builtin
     > functions are not profiled the same way as PyPy's.
lsprofcalltree.py .
From APSW's source code, i think that it uses this API:
(in cursor.c)
    PyUnicode_DecodeUTF8
Maybe lsprofcalltree doesn't profile it?
Indeed. CPU cost is hidden in the cursor method.
Thanks Amaury for looking into this,
Assuming that PyPy's "codecs.utf_8_decode" is slower when used with CFFI
than using PyUnicode_DecodeUTF8 in CPython.
Is there anything that can be done in CFFI that would have the same
performance as PyUnicode_DecodeUTF8 (and the same for encode)
First, codecs.utf_8_decode has nothing to do with CFFI...
Then, do we have evidence that the utf8 codec is enough to explain the
different performance?

Since your data is only ASCII, it would be interesting to use the ASCII
encoding:
try to replace PyUnicode_DecodeUTF8 by PyUnicode_DecodeASCII
and codecs.utf_8_decode by codecs.ascii_decode

-- 
Amaury Forgeot d'Arc