a simple unicode question
Chris Jones
cjns1989 at gmail.com
Thu Oct 22 05:43:58 EDT 2009
On Wed, Oct 21, 2009 at 12:35:11PM EDT, Nobody wrote:
[..]
> Characters outside the 16-bit range aren't supported on all builds.
> They won't be supported on most Windows builds, as Windows uses 16-bit
> Unicode extensively:
I knew nothing about UTF-16 & friends before this thread.
Best part of Unicode is that there are multiple encodings, right? ;-)
Moot point on xterm anyway, since you'd be hard put to it to find a
decent terminal font that covers anything outside the BMP.
> Python 2.5.1 (r251:54863, Apr 18 2007, 08:51:08) [MSC v.1310 32 bit
> (Intel)] on win32
> >>> unichr(0x10000)
> Traceback (most recent call last):
> File "<stdin>", line 1, in <module>
> ValueError: unichr() arg not in range(0x10000) (narrow Python build)
>
> Note that narrow builds do understand names outside of the BMP, and
> generate surrogate pairs for them:
>
> >>> u'\N{LINEAR B SYLLABLE B008 A}'
> u'\U00010000'
> >>> len(_)
> 2
>
> Whether or not using surrogates in this context is a good idea is open to
> debate. What's the advantage of a multi-wchar string over a multi-byte
> string?
I don't understand this last remark, but since I'm only a GNU/Linux
hobbyist, I guess it doesn't make much difference.
Thanks for the code snippet and comments.
CJ
More information about the Python-list
mailing list