Converting html character codes to utf-8 text
Peter Otten
__peter__ at web.de
Tue Jun 19 07:14:36 EDT 2012
Johann Spies wrote:
> I am trying the following:
>
> Change data like this:
>
> Bien Donné : agri tourism
>
> to this:
>
> Bien Donné agri tourism
>
> I am using the 'unescape' function published on
> http://effbot.org/zone/re-sub.htm#unescape-html but working through a file
> I get the following error:
>
> UnicodeDecodeError: 'ascii' codec can't decode byte 0xc3 in position 519:
> ordinal not in range(128)
>
> and I do not now how to solve this problem.
>
> Any solution will be very appriciated.
The information you give is not sufficient to give a fix, but my crystal
ball says that the string you pass to unescape() contains an e with acute
encoded in utf-8 and not as an html escape. Instead of
unescape(mydata)
try
unescape(mydata.decode("utf-8"))
If that doesn't fix the problem come back with a self-contained example.
More information about the Python-list
mailing list