
"M.-A. Lemburg" wrote:
...
I'd suggest not to use the term character in this PEP at all; this is also what Mark Davis recommends in his paper on Unicode.
That's fine, but Python does have a concept of character and I'm going to use the term character for discussing these.
Also, a link to the Unicode glossary would be a good thing.
Funny how these little PEPs grow...
... Why not make the codec used by Python to convert Unicode literals to Unicode strings an option just like the default encoding ?
That way we could have a version of the unicode-escape codec which supports surrogates and one which doesn't.
Adding more and more knobs to tweak just adds up to Python code being non-portable from one machine to another.
ISSUE: Should Python allow the construction of characters that do not correspond to Unicode characters? Unassigned Unicode characters should obviously be legal (because they could be assigned at any time). But code points above TOPCHAR are guaranteed never to be used by Unicode. Should we allow access to them anyhow?
I wouldn't count on that last point ;-)
Please note that you are mixing terms: you don't construct characters, you construct code points. Whether the concatenation of these code points makes a valid Unicode character string is an issue which applications and codecs have to decide.
unichr() does not construct code points. It constructs 1-char Python Unicode strings...also known as Python Unicode characters.
... Whether the concatenation of these code points makes a valid Unicode character string is an issue which applications and codecs have to decide.
The concatenation of true code points would *always* make a valid Unicode string, right? It's code units that cannot be blindly concatenated.
... We should provide a new module which provides a few handy utilities though: functions which provide code point-, character-, word- and line- based indexing into Unicode strings.
Okay, I'll add: It has been proposed that there should be a module for working with UTF-16 strings in narrow Python builds through some sort of abstraction that handles surrogates for you. If someone wants to implement that, it will be another PEP. -- Take a recipe. Leave a recipe. Python Cookbook! http://www.ActiveState.com/pythoncookbook