newbie with a encoding question, please help

Mister Yu eryan.yu at
Thu Apr 1 12:56:05 CEST 2010

hi experts,

i m new to python, i m writing crawlers to extract data from some
chinese websites, and i run into a encoding problem.

i have a unicode object, which looks like this u'\xd6\xd0\xce\xc4'
which is encoded in "gb2312", but i have no idea of how to convert it
back to utf-8

to re-create this one is easy:

this will work
>>> su = u"中文".encode('gb2312')
>>> su
>>> print su.decode('gb2312')
中文    -> (same as the original string)

but this doesn't,why
>>> su = u'\xd6\xd0\xce\xc4'
>>> su
>>> print su.decode('gb2312')
Traceback (most recent call last):
  File "<console>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position
0-3: ordinal not in range(128)

thank you

More information about the Python-list mailing list