[Python-3000] String comparison
Rauli Ruohonen
rauli.ruohonen at gmail.com
Sat Jun 9 23:01:57 CEST 2007
On 6/9/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> Rauli Ruohonen writes:
> > The ones it absolutely prohibits in interchange are surrogates.
>
> Excuse me? Surrogates are code points with a specific interpretation
> if it is "purported that the stream is in UTF-16". Otherwise, Unicode
> 4.0 explicitly says that there is nothing illegal about an isolated
> surrogate (p.75, where an example is given of how such a surrogate
> might occur).
I meant interchange instead of strings. Anything is allowed in strings.
Chapter 2 (not normative, but clear) explains on page 26:
Restricted interchange. [...]
- Surrogate code points cannot be conformantly interchanged using
Unicode encoding forms. [...]
- Noncharacter code points are reserved for internal use, such as for
sentinel values. They should never be interchanged. [...]
> My point was precisely that I don't object to this implementation. I
> want Unicode-ly-correct behavior to be a goal of the language, the
> community disagrees, and Guido disagrees. That's that.
My understanding is that it is a goal, but practicality beats purity.
I think the only disagreement is on what's practical.
More information about the Python-3000
mailing list