htmllib.HTMLParser and unicode
Achim Domma
domma at procoders.net
Wed Sep 17 06:06:26 EDT 2003
Hi,
should the HTMLParser be able to handle unicode input? I get the following
traceback:
self.feed(self.data)
File "C:\Python23\lib\sgmllib.py", line 94, in feed
self.goahead(0)
File "C:\Python23\lib\sgmllib.py", line 183, in goahead
self.handle_entityref(name)
File "C:\Python23\lib\sgmllib.py", line 390, in handle_entityref
self.handle_data(table[name])
File "C:\Python23\lib\htmllib.py", line 49, in handle_data
self.savedata = self.savedata + data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:
ordinal not in range(128)
The input is a html page from the web, encoded as utf8. I converted the
string via data.decode('utf8'). The result is passed to the feed function.
regards,
Achim
More information about the Python-list
mailing list