[Python-3000] How will unicode get used?
Jim Jewett
jimjjewett at gmail.com
Wed Sep 20 22:59:22 CEST 2006
On 9/20/06, Guido van Rossum <guido at python.org> wrote:
> On 9/20/06, Adam Olsen <rhamph at gmail.com> wrote:
> > On 9/20/06, Guido van Rossum <guido at python.org> wrote:
> > > Let me cut this short. The external string API in Py3k should not
> > > change or only very marginally so (like removing rarely used useless
> > > APIs or adding a few new conveniences).
...
> I thought we were discussing the Python API.
I don't think anyone has proposed much change to strings *as seen from
python*.
At most, there has been an implicit suggestion that the
bytes.decode().encode() dance be shortened.
> C code will likely have the same access to unicode objects as it has in 2.x.
Can C code still assume that
(1) the data buffer will always be available for any sort of
direct manipulation (including mutation)
(2) in a specific canonical encoding
(3) directly from the memory layout, without calling a "prepare"
or "recode" or "encode" method first.
Today, that canonical encoding is a compile-time choice, and any
specific choice causes integration hassles.
Unless the choice matches the system default for text, it also
requires many decode/encode round trips that might otherwise be
avoided.
The proposed changes mostly boil down to removing the third
assumption, and agreeing that some implementations might delay the
decode-to-canonical-format until it was needed.
Rough Summary of new C API restrictions:
Replace
((PyStringObject *)string).ob_sval /* supported today */
with
PyString_AsString(string) /* already recommended */
or replace
((PyUnicodeObject *)string)->str /* supported today */
and
((PyUnicodeObject *)string)->defenc /* supported today */
with
PyUnicode_AsEncodedString(PyObject *unicode, /* already recommended */
const char *encoding,
const char *errors)
and
PyUnicode_AsAnyString(PyObject *unicode, /* new */
char **encoding, /* return the actual encoding */
const char *errors)
Also note that some macros would need to become functions. The most
prominent is
PyUnicode_AS_DATA(string) /* supports mutation */
-jJ
More information about the Python-3000
mailing list