[Python-3000] String comparison

Fri Jun 15 08:13:29 CEST 2007

> - chr(i) returns a len-1 or len-2 string for all i in range(0, 0x110000) and
>   ord(chr(i)) == i for all i in range(0, 0x110000)

This would contradict an explicit decision in PEP 261. I'm don't quite
remember the rationale for that, however, the PEP mentions that ord()
should be symmetric with chr().

Whether it would be acceptable to allow selected length-two strings
in ord, I don't know.

> - unicodedata.name(chr(i)) returns the same result for all i on both UCS-2
>   and UCS-4 builds (and same for bidirectional(), category(), combining(),
>   decimal(), decomposition(), digit(), east_asian_width(), mirrored() and
>   numeric() in unicodedata)

There is a patch on SF requesting such a change for .lookup. I think
this should be done in 2.6, not 3.0. It doesn't have the ord/unichr
issue, so I think the same concerns don't apply.

> - return len-1 or len-2 strings on unicodedata.lookup(), instead of always
>   len-1 strings (e.g. unicodedata.lookup('AEGEAN WORD SEPARATOR LINE')
>   returns '\u0100' on UCS-2 builds, but '\U00010100' on UCS-4 builds)

See the patch on SF.

> - unicodedata.normalize(s) interprets its input as UTF-16 on UCS-2 builds

Definitely; somebody would have to write the code.

> - use ValueError instead of TypeError in the above when passed an
>   inappropriate string, e.g. ord('aa')

I'm not sure about this one. The TypeError is deliberate currently.

Regards,
Martin