newbie with a encoding question, please help
Mister Yu
eryan.yu at gmail.com
Thu Apr 1 06:56:05 EDT 2010
hi experts,
i m new to python, i m writing crawlers to extract data from some
chinese websites, and i run into a encoding problem.
i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4'
which is encoded in "gb2312", but i have no idea of how to convert it
back to utf-8
to re-create this one is easy:
this will work
============================
>>> su = u"中文".encode('gb2312')
>>> su
u
>>> print su.decode('gb2312')
中文 -> (same as the original string)
============================
but this doesn't,why
===========================
>>> su = u'\xd6\xd0\xce\xc4'
>>> su
u'\xd6\xd0\xce\xc4'
>>> print su.decode('gb2312')
Traceback (most recent call last):
File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-3: ordinal not in range(128)
===========================
thank you
More information about the Python-list
mailing list