[pypy-dev] CFFI and UTF8

Thu Jul 31 17:39:18 CEST 2014

Hi,

On 31 July 2014 16:47, Eleytherios Stamatogiannakis <estama at gmail.com> wrote:
> Wouldn't it be faster to have a ffi.stringUTF8 for the case where we know
> the input is in UTF8?

It seems the truth is the opposite of what you expect.  Right now,
`ffi.string(p).decode('utf-8')` does two copies, whereas in the
proposed UTF8 future of PyPy the same expression might possibly be
done with only one copy (because `s` and `s.decode('utf-8')` could
share the same byte string).

It doesn't mean the idea of `ffi.stringUTF8()` is necessarily bad, but
it should be a CFFI discussion instead of a PyPy one.  I'm "-0" on the
idea as adding more complexity to the API for just a minor performance
gain (particularly one that disappears in the UTF8 future of PyPy).

> Ideally we could also have a ffi.stringUTF8const, which knowing that the
> char* is const (won't be changed by the C side), won't do a copy at all?

That's not possible: a PyPy string object cannot point directly to raw
memory, but must contain its own data, just like a CPython
(byte)string object.

A bientôt,

Armin.