Re: [Python-Dev] Support for "wide" Unicode characters

June 29, 2001


      ...
I'd suggest not to use the term character in this PEP at all;
this is also what Mark Davis recommends in his paper on Unicode.
I like this idea!  I know that I *still* have a hard time not to think
"C 'char' datatype, i.e. an 8-bit byte" when I read "character"...
...
Why not make the codec used by Python to convert Unicode
literals to Unicode strings an option just like the default
encoding ?
That way we could have a version of the unicode-escape codec
which supports surrogates and one which doesn't.
Smart idea, but how practical is this?  Can you spec this out a bit more?
...
+1 on removing knowledge about surrogates from the Unicode
implementation core (it's also the easiest: there is none :-)
Except for \U currently -- or is that not part of the implementation core?
...
We should provide a new module which provides a few handy
utilities though: functions which provide code point-, 
character-, word- and line- based indexing into Unicode 
strings.
But its design is outside the scope of this PEP, I'd say.

--Guido van Rossum (home page: http://www.python.org/~guido/)