Short questions wrt Python & Unicode
John Machin
sjmachin at lexicon.net
Fri Jun 9 08:59:45 EDT 2006
On 9/06/2006 10:04 PM, KvS wrote:
> 2) How do I get a representation of a unic. object in terms of Unicode
> code points? repr() doesn't do that, it sometimes parses or encodes the
> code points right:
>
>|>>> s=u"\u0040\u0166\u00e6"
>|>>> s
> u'@\u0166\xe6'
|>>> ' '.join('U+%04X % ord(c) for c in s)
'U+0040 U+0166 U+00E6'
If you'd prefer it more Pythonic than unicode.orgic, adjust the format
string and separator to suit your taste.
> (does this latter \xe6 have to do with the internal representation of
> unic. objects, maybe with this UCS-2 encoding?)
|>>> u'\xe6' == u'\u00e6' == unichr(0xe6)
True
|>>> hex(ord(u'\u00e6'))
'0xe6'
U+nnnnnn is represented internally as the integer 0xnnnnnn -- except if
it won't fit, but you can pretend that surrogate pairs don't exist, for
the moment :-)
Cheers,
John
More information about the Python-list
mailing list