a simple unicode question
Nobody
nobody at nowhere.com
Wed Oct 21 12:35:11 EDT 2009
On Wed, 21 Oct 2009 05:16:56 -0400, Chris Jones wrote:
>> > Where are the literals (i.e. u'\N{DEGREE SIGN}') defined?
>>
>> You can get them from the unicodedata module, e.g.:
>>
>> import unicodedata
>> for i in xrange(0x10000):
>> n = unicodedata.name(unichr(i),None)
>> if n is not None:
>> print i, n
>
> Python rocks!
>
> Just curious, why did you choose to set the upper boundary at 0xffff?
Characters outside the 16-bit range aren't supported on all builds. They
won't be supported on most Windows builds, as Windows uses 16-bit Unicode
extensively:
Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit (Intel)] on
win32
>>> unichr(0x10000)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: unichr() arg not in range(0x10000) (narrow Python build)
Note that narrow builds do understand names outside of the BMP, and
generate surrogate pairs for them:
>>> u'\N{LINEAR B SYLLABLE B008 A}'
u'\U00010000'
>>> len(_)
2
Whether or not using surrogates in this context is a good idea is open to
debate. What's the advantage of a multi-wchar string over a multi-byte
string?
More information about the Python-list
mailing list