[Tutor] unichr() question
Terry Carroll
carroll at tjc.com
Thu Oct 16 22:54:05 EDT 2003
On Tue, 14 Oct 2003, Ezequiel, Justin wrote:
> PythonWin 2.2.2
> Windows XP
>
> >>> long('1D4AA', 16)
> 119978L
> >>> unichr(long('1D4AA', 16))
> Traceback (most recent call last):
> File "<interactive input>", line 1, in ?
> ValueError: unichr() arg not in range(0x10000) (narrow Python build)
> >>> x = eval("u'\\U000%s'" % '1D4AA')
> >>> x
> u'\U0001d4aa'
> >>> for c in x:
> ... print ord(c)
> ...
> 55349
> 56490
> >>> unichr(55349) + unichr(56490)
> u'\U0001d4aa'
> >>>
>
> How do I convert strings such as '1D4AA' to unicode without using eval()?
Justin, I see you haven't gotten any responses on this yet. I don't know
an answer, but I ran into something similar on some of the Unihan
characters. Fortunately for me, I found I could just ignore any that were
over x'FFFF'; it doesn't sound like you can.
I looked into it for a while and determined that it depends on how your
Python was built. If it was a "narrow build", it supports Unicode
characters only up to x'FFFF'; if a "wide build", it supports Unicode
x'10000" and higher, as well. As far as I can tell, it depends on whether
the installer specified "--enable-unicode=ucs4" to get the wide build.
I'm a Windows user, too, and dependent on the Activestate build, which is
narrow. In the end, I decided to just avoid the higher Unicode values,
which didn't matter for me. If you have a way of getting a "wide build" I
suspect this would do the trick for you.
There's more information in PEP 261,
http://www.python.org/peps/pep-0261.html : I think this is the last word
on it.
Hopefully, some others more informed on Python internals and Unicode can
give more information on this, but I hope this helps somewhat.
--
Terry Carroll
Santa Clara, CA
carroll at tjc.com
More information about the Tutor
mailing list