[pypy-dev] Unicode encode/decode speed

Mon Feb 11 18:36:23 CET 2013

On 11/02/13 19:14, Amaury Forgeot d'Arc wrote:
>
>
> 2013/2/11 Eleytherios Stamatogiannakis <estama at gmail.com
> <mailto:estama at gmail.com>>
>
>     On 11/02/13 18:13, Amaury Forgeot d'Arc wrote:
>...
 >
 > Which kind of profiler are you using? It possible that CPython builtin
 > functions are not profiled the same way as PyPy's.

lsprofcalltree.py .

 From APSW's source code, i think that it uses this API:

(in cursor.c)
PyUnicode_DecodeUTF8

Maybe lsprofcalltree doesn't profile it?

>
> No, my question was about the number of non-ascii characters:
>      s = u"SomeUnicodeString"
>      1.0 * len(s.encode('utf8')) / len(s)
> PyPy allocates the StringBuffer upfront, and must realloc to cope with
> multibytes characters.
> For English text, ratio is 1.0; for Greek, it will be close to 2.0.
>

All of our tests use only plain English ASCII chars (converted to 
unicode). So the ratio is 1.0 .

l.