[pypy-dev] Unicode encode/decode speed

Mon Feb 11 17:13:58 CET 2013

2013/2/11 Eleytherios Stamatogiannakis <estama at gmail.com>

> Right now we are using PyPy's "codecs.utf_8_encode" and
> "codecs.utf_8_decode" to do this conversion.
>

It's the most direct way to use the utf-8 conversion functions.

> It there a faster way to do these conversions (encoding, decoding) in
> PyPy? Does CPython do something more clever than PyPY, like storing
> unicodes with full ASCII char content, in an ASCII representation?
>

Over years, utf-8 conversions have been heavily optimized in CPython:
allocate short buffers on the stack, use aligned reads, quick check for
ascii-only content (data & 0x80808080)...
All things that pypy does not.

But I tried some "timeit" runs, and pypy is often faster that CPython, and
never much slower.
Do your strings have many non-ascii characters?
what's the len(utf8)/len(unicode) ratio?

-- 
Amaury Forgeot d'Arc
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/pypy-dev/attachments/20130211/3062cb8c/attachment.html>