[Tutor] Assistance with UnicodeDecodeError
James Chapman
james at uplinkzero.com
Wed Feb 4 18:01:40 CET 2015
>
> I am trying to scrap text from a website using Python 2.7 in windows 8 and
> i am getting this error *"**UnicodeDecodeError: 'charmap codec can't encode
> character u'\u2014 in position 11231 character maps to <undefined>"*
>
>
For starters, move away from Python 2 unless you have a good reason to use
it. Unicode is built into Python 3 whereas it's an after thought in Python
2.
What's happening is that python doesn't understand the character set in use
and it's throwing the exception. You need to tell python what encoding to
use: (not all website are "utf-8")
Code example (using python 2.7):
>>> u = u'\u2014'
>>> print(u)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "c:\Python27\lib\encodings\cp850.py", line 12, in encode
return codecs.charmap_encode(input,errors,encoding_map)
UnicodeEncodeError: 'charmap' codec can't encode character u'\u2014' in
position 0: character maps to <undefined>
>>> s = u.encode("utf-8")
>>> print(s)
ÔÇö
I also strongly suggest you read:
https://docs.python.org/2/howto/unicode.html
There is much cursing to come. Unicode and especially multi-byte character
string processing is a nightmare!
Good luck ;-)
James
More information about the Tutor
mailing list