[Python-3000] How will unicode get used?

Brett Cannon brett at python.org
Wed Sep 20 20:30:28 CEST 2006


On 9/20/06, Adam Olsen <rhamph at gmail.com> wrote:
>
> On 9/20/06, Guido van Rossum <guido at python.org> wrote:
> > On 9/20/06, Adam Olsen <rhamph at gmail.com> wrote:
> > > Before we can decide on the internal representation of our unicode
> > > objects, we need to decide on their external interface.  My thoughts
> > > so far:
> >
> > Let me cut this short. The external string API in Py3k should not
> > change or only very marginally so (like removing rarely used useless
> > APIs or adding a few new conveniences). The plan is to keep the 2.x
> > API that is supported (in 2.x) by both str and unicode, but merge the
> > twp string types into one. Anything else could be done just as easily
> > before or after Py3k.
>
> Thanks, but one thing remains unclear: is the indexing intended to
> represent bytes, code points, or code units?  Note that C code
> operating on UTF-16 would use code units for slicing of UTF-16, which
> splits surrogate pairs.


Assuming my Unicode lingo is right and code point represents a
letter/character/digraph/whatever, then it will be a code point.  Doing one
of my rare channels of Guido, I *really* doubt he wants to expose the
technical details of Unicode to the point of having people need to realize
that UTF-8 takes two bytes to represent "ö".  If you want that kind of
exposure, use the bytes type.  Otherwise assume the usage will be by people
ignorant of Unicode and thus want something that will work the way they are
used to when compared to working in ASCII.

-Brett
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://mail.python.org/pipermail/python-3000/attachments/20060920/a71a932e/attachment.htm 


More information about the Python-3000 mailing list