[Python-3000] String comparison
Bill Janssen
janssen at parc.com
Thu Jun 7 04:19:56 CEST 2007
I wrote:
> Guido wrote:
> > So let me explain it. I see two different sequences of code points:
> > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308',
> > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that
> > claim they are equivalent. They are two different sequences of code
> > points.
>
> If they were sequences of integers, or sequences of bytes, I'd agree
> with you. But they are explicitly sequences of characters, not
> sequences of codepoints. There should be one internal normalized form
> for strings.
I meant to say that *strings* are explicitly sequences of characters,
not codepoints. So both sequences of codepoints should collapse to
the same *string* when they are turned into a string. While the two
sequences of codepoints should not compare equal, the strings formed
from them should compare equal.
I also believe that the literal form '\u0308' should generate a compile
error. It's a valid Unicode codepoint, sure, but not a valid string.
string((ord('L'), 0xF6, ord('w'), ord('i'), ord('s'))) ==
string((ord('L'), ord('o'), 0x308, ord('w'), ord('i'), ord('s')))
Bill
More information about the Python-3000
mailing list