unicode wrap unicode object?
Fredrik Lundh
fredrik at pythonware.com
Sat Apr 8 04:42:16 EDT 2006
"ygao" wrpte_
> I must use utf-8 for chinese.
yeah, but you shouldn't store it in a *Unicode* string. Unicode strings
are designed to hold things that you've already decoded (that is, your
chinese text), not the raw UTF-8 bytes.
if you store the UTF-8 in an ordinary 8-bit string instead, you can use
the unicode constructor to convert things properly:
b = "... some utf-8 data ..."
# turn it into a unicode string
u = unicode(b, "utf-8")
# ... do something with it ...
# turn it back into a utf-8 string
s = u.encode("utf-8")
# or use some other encoding
s = u.encode("big5")
e.g.
>>> b = '\xe9\xab\x98'
>>> u = unicode(b, "utf-8")
>>> u.encode("utf-8")
'\xe9\xab\x98'
>>> u.encode("big5")
'\xb0\xaa'
</F>
More information about the Python-list
mailing list