break unichr instead of fix ord?
rurpy at yahoo.com
rurpy at yahoo.com
Tue Aug 25 15:45:49 EDT 2009
In Python 2.5 on Windows I could do [*1]:
# Create a unicode character outside of the BMP.
>>> a = u'\U00010040'
# On Windows it is represented as a surogate pair.
>>> len(a)
2
>>> a[0],a[1]
(u'\ud800', u'\udc40')
# Create the same character with the unichr() function.
>>> a = unichr (65600)
>>> a[0],a[1]
(u'\ud800', u'\udc40')
# Although the unichr() function works fine, its
# inverse, ord(), doesn't.
>>> ord (a)
TypeError: ord() expected a character, but string of length 2 found
On Python 2.6, unichr() was "fixed" (using the word
loosely) so that it too now fails with characters outside
the BMP.
>>> a = unichr (65600)
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
Why was this done rather than changing ord() to accept a
surrogate pair?
Does not this effectively make unichr() and ord() useless
on Windows for all but a subset of unicode characters?
More information about the Python-list
mailing list