[Python-3000] How will unicode get used?
Guido van Rossum
guido at python.org
Wed Sep 20 20:32:04 CEST 2006
On 9/20/06, Adam Olsen <rhamph at gmail.com> wrote:
> On 9/20/06, Guido van Rossum <guido at python.org> wrote:
> > On 9/20/06, Adam Olsen <rhamph at gmail.com> wrote:
> > > Before we can decide on the internal representation of our unicode
> > > objects, we need to decide on their external interface. My thoughts
> > > so far:
> >
> > Let me cut this short. The external string API in Py3k should not
> > change or only very marginally so (like removing rarely used useless
> > APIs or adding a few new conveniences). The plan is to keep the 2.x
> > API that is supported (in 2.x) by both str and unicode, but merge the
> > twp string types into one. Anything else could be done just as easily
> > before or after Py3k.
>
> Thanks, but one thing remains unclear: is the indexing intended to
> represent bytes, code points, or code units?
I don't see what's unclear -- the existing unicode object does what it does.
> Note that C code
> operating on UTF-16 would use code units for slicing of UTF-16, which
> splits surrogate pairs.
I thought we were discussing the Python API.
C code will likely have the same access to unicode objects as it has in 2.x.
> As far as I can tell, CPython on windows uses UTF-16 with code units.
> Perhaps not intentionally, but by default (not throwing an error on
> surrogates).
This is intentional, to be compatible with the rest of that platform.
Jython and IronPython do this too I believe.
--
--Guido van Rossum (home page: http://www.python.org/~guido/)
More information about the Python-3000
mailing list