[pypy-dev] Unicode encode/decode speed

Sun Feb 17 10:55:07 CET 2013

On Sun, Feb 17, 2013 at 11:43 AM, Armin Rigo <arigo at tunes.org> wrote:
> Hi,
>
> On Tue, Feb 12, 2013 at 7:14 PM, Eleytherios Stamatogiannakis
> <estama at gmail.com> wrote:
>> Also we are looking into adding a special ffi.string_decode_UTF8 in CFFI's
>> backend to reduce the number of calls that are needed to go from utf8_char*
>> to PyPy's unicode.
>
> A first note: I'm wondering why you need to convert from
> utf-8-that-contains-only-ascii, to unicode, and back.  What is the
> point of having unicode strings in the first place?  Can't you just
> pass around your complete program plain non-unicode strings?
>
> If not, then indeed, it would make (a bit of) sense to have ways to
> convert directly between "char *" and unicode strings, in both
> directions, assuming utf-8.  This could be done with an API like:
>
> ffi.encode_utf8(unicode_string) -> new_char*_cdata
> ffi.encode_utf8(unicode_string, target_char*_cdata, maximum_length)
> ffi.decode_utf8(char*_cdata, [length]) -> unicode_string
>
> Alternatively, we could accept unicode strings whenever a "char*" is
> expected and encode it to utf-8, but that sounds a bit too magical.
>
>
> A bientôt,
>
> Armin.
> _______________________________________________
> pypy-dev mailing list
> pypy-dev at python.org
> http://mail.python.org/mailman/listinfo/pypy-dev

We should add rffi.charp2unicode too