newbie with a encoding question, please help

Mister Yu eryan.yu at gmail.com
Thu Apr 1 06:56:05 EDT 2010


hi experts,

i m new to python, i m writing crawlers to extract data from some
chinese websites, and i run into a encoding problem.

i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4'
which is encoded in "gb2312", but i have no idea of how to convert it
back to utf-8

to re-create this one is easy:

this will work
============================
>>> su = u"中文".encode('gb2312')
>>> su
u
>>> print su.decode('gb2312')
中文    -> (same as the original string)

============================
but this doesn't,why
===========================
>>> su = u'\xd6\xd0\xce\xc4'
>>> su
u'\xd6\xd0\xce\xc4'
>>> print su.decode('gb2312')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-3: ordinal not in range(128)
===========================

thank you



More information about the Python-list mailing list