[pypy-dev] Unicode encode/decode speed
Armin Rigo
arigo at tunes.org
Sun Feb 17 10:43:45 CET 2013
Hi,
On Tue, Feb 12, 2013 at 7:14 PM, Eleytherios Stamatogiannakis
<estama at gmail.com> wrote:
> Also we are looking into adding a special ffi.string_decode_UTF8 in CFFI's
> backend to reduce the number of calls that are needed to go from utf8_char*
> to PyPy's unicode.
A first note: I'm wondering why you need to convert from
utf-8-that-contains-only-ascii, to unicode, and back. What is the
point of having unicode strings in the first place? Can't you just
pass around your complete program plain non-unicode strings?
If not, then indeed, it would make (a bit of) sense to have ways to
convert directly between "char *" and unicode strings, in both
directions, assuming utf-8. This could be done with an API like:
ffi.encode_utf8(unicode_string) -> new_char*_cdata
ffi.encode_utf8(unicode_string, target_char*_cdata, maximum_length)
ffi.decode_utf8(char*_cdata, [length]) -> unicode_string
Alternatively, we could accept unicode strings whenever a "char*" is
expected and encode it to utf-8, but that sounds a bit too magical.
A bientôt,
Armin.
More information about the pypy-dev
mailing list