[Python-3000] String comparison

Bill Janssen janssen at parc.com
Thu Jun 7 04:19:56 CEST 2007


I wrote:
> Guido wrote:
> > So let me explain it. I see two different sequences of code points:
> > 'L', '\u00F6', 'w', 'i', 's' on the one hand, and 'L', 'o', '\u0308',
> > 'w', 'i', 's' on the other. Never mind that Unicode has semantics that
> > claim they are equivalent. They are two different sequences of code
> > points.
> 
> If they were sequences of integers, or sequences of bytes, I'd agree
> with you.  But they are explicitly sequences of characters, not
> sequences of codepoints.  There should be one internal normalized form
> for strings.

I meant to say that *strings* are explicitly sequences of characters,
not codepoints.  So both sequences of codepoints should collapse to
the same *string* when they are turned into a string.  While the two
sequences of codepoints should not compare equal, the strings formed
from them should compare equal.

I also believe that the literal form '\u0308' should generate a compile
error.  It's a valid Unicode codepoint, sure, but not a valid string.

 string((ord('L'), 0xF6, ord('w'), ord('i'), ord('s'))) ==
 string((ord('L'), ord('o'), 0x308, ord('w'), ord('i'), ord('s')))

Bill


More information about the Python-3000 mailing list