<br><div class="gmail_quote">2013/2/11 Eleytherios Stamatogiannakis <span dir="ltr"><<a href="mailto:estama@gmail.com" target="_blank">estama@gmail.com</a>></span><br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div id=":2kc">Right now we are using PyPy's "codecs.utf_8_encode" and "codecs.utf_8_decode" to do this conversion.<br></div></blockquote><div><br></div><div>It's the most direct way to use the utf-8 conversion functions.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div id=":2kc">
It there a faster way to do these conversions (encoding, decoding) in PyPy? Does CPython do something more clever than PyPY, like storing unicodes with full ASCII char content, in an ASCII representation?</div></blockquote>
<div><br></div><div>Over years, utf-8 conversions have been heavily optimized in CPython:</div><div>allocate short buffers on the stack, use aligned reads, quick check for ascii-only content (data & 0x80808080)...</div>
</div><div>All things that pypy does not.</div><div><br></div>But I tried some "timeit" runs, and pypy is often faster that CPython, and never much slower.<div>Do your strings have many non-ascii characters?</div>
<div>what's the len(utf8)/len(unicode) ratio?<br><div><br clear="all"><div><br></div>-- <br>Amaury Forgeot d'Arc
</div></div>