[I18n-sig] How does Python Unicode treat surrogates?
Guido van Rossum
guido@digicool.com
Mon, 25 Jun 2001 13:42:29 -0400
> So what has been implemented is UCS-2, not UTF-16, and certainly not
> Unicode. Better to document u"" string literals as UCS-2, and not
> Unicode.
I'm sorry, but I don't see why it's UCS-2 any more or less than
UTF-16. That's like arguing whether 8-bit strings contains ASCII or
UTF-8. That's up to the application; the data type can be used for
either.
> > It may change *eventually* -- when we switch to UCS-4 for the internal
> > representation. Until then, the API will deal in 16-bit values that
> > may or may not be "characters".
>
> You don't need to switch to UCS-4 internally to implement what I'm
> suggesting.
But unless I misunderstand what it *is* that you are suggesting, the
O(1) indexing property can't be retained with your suggestion, and
that's out of the question.
> > I'd say that ideally the choice to have a 2 or 4 byte internal
> > representation (or no Unicode support at all, for some platforms like
> > PalmOS!) should be a configuration choice.
>
> I don't think it should be a configuration choice. That leads to
> incompatibilities between people's scripts. It's bad enough already
> with some things working with threaded versions of python and some not
> (e.g., Zope requires threading, but mod_python doesn't work if its
> turned on).
That turned out to be a myth, actually. mod_python works fine with
threads on most platforms.
Anyway, code that specifically doesn't work when a particular feature
is turned *on* is rare. Code that *requires* a specific feature is
common, of course, and I would think that Python's Unicode type is
useful as it is for applications that don't need the newer planes.
> BTW, Palm recently joined the Unicode Consortium, and Symbian has
> Unicode support.
>
> >Right now the implementation doesn't allow that choice at all, which
> >should be remedied -- maybe you can help by submitting patches?
>
> Touché.
:-)
--Guido van Rossum (home page: http://www.python.org/~guido/)