[Python-3000] String comparison

Thu Jun 14 15:51:09 CEST 2007

On 6/13/07, Stephen J. Turnbull <stephen at xemacs.org> wrote:
> except that people will sneak in some UTF-16 behavior where it seems useful.

How about sneaking these in py3k-struni:

- chr(i) returns a len-1 or len-2 string for all i in range(0, 0x110000) and
  ord(chr(i)) == i for all i in range(0, 0x110000)

- unicodedata.name(chr(i)) returns the same result for all i on both UCS-2
  and UCS-4 builds (and same for bidirectional(), category(), combining(),
  decimal(), decomposition(), digit(), east_asian_width(), mirrored() and
  numeric() in unicodedata)

- return len-1 or len-2 strings on unicodedata.lookup(), instead of always
  len-1 strings (e.g. unicodedata.lookup('AEGEAN WORD SEPARATOR LINE')
  returns '\u0100' on UCS-2 builds, but '\U00010100' on UCS-4 builds)

- unicodedata.normalize(s) interprets its input as UTF-16 on UCS-2 builds

- use ValueError instead of TypeError in the above when passed an
  inappropriate string, e.g. ord('aa')

Any chances?