unicode wrap unicode object?
Fredrik Lundh
fredrik at pythonware.com
Sat Apr 8 02:26:38 EDT 2006
"ygao" <ygao2004 at gmail.com> wrote:
> >>> import sys
> >>> sys.setdefaultencoding("utf-8")
hmm. what kind of bootleg python is that ?
>>> import sys
>>> sys.setdefaultencoding("utf-8")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
AttributeError: 'module' object has no attribute 'setdefaultencoding'
(you're not supposed to change the default encoding. don't
do that; it'll only cause problems in the long run).
> >>> s='\xe9\xab\x98' #this uff-8 string
> >>> ss=U'\xe9\xab\x98'
> >>> s
> '\xe9\xab\x98'
> >>> ss
> u'\xe9\xab\x98'
> >>>
> how do I get ss from s?
> Can there be a way do this?
you have UTF-8 *bytes* in a Unicode text string? sounds like
someone's made a mistake earlier on...
anyway, iso-8859-1 is, in practice, a null transform, that simply
converts unicode characters to bytes:
>>> s = ss.encode("iso-8859-1")
>>> s
'\xe9\xab\x98'
>>> s.decode("utf-8")
u'\u9ad8'
>>> import unicodedata
>>> unicodedata.name(s.decode("utf-8"))
'CJK UNIFIED IDEOGRAPH-9AD8'
but it's probably better to fix the code that puts UTF-8 data in your
Unicode strings (look for bogus iso-8859-1 conversions)
</F>
More information about the Python-list
mailing list