2013/2/11 Eleytherios Stamatogiannakis
<estama@gmail.com>
Right now we are using PyPy's "codecs.utf_8_encode" and "codecs.utf_8_decode" to do this conversion.
It's the most direct way to use the utf-8 conversion functions.
It there a faster way to do these conversions (encoding, decoding) in PyPy? Does CPython do something more clever than PyPY, like storing unicodes with full ASCII char content, in an ASCII representation?
Over years, utf-8 conversions have been heavily optimized in CPython:
allocate short buffers on the stack, use aligned reads, quick check for ascii-only content (data & 0x80808080)...
All things that pypy does not.