htmllib.HTMLParser and unicode

Achim Domma domma at procoders.net
Wed Sep 17 12:06:26 CEST 2003


Hi,

should the HTMLParser be able to handle unicode input? I get the following
traceback:

    self.feed(self.data)
  File "C:\Python23\lib\sgmllib.py", line 94, in feed
    self.goahead(0)
  File "C:\Python23\lib\sgmllib.py", line 183, in goahead
    self.handle_entityref(name)
  File "C:\Python23\lib\sgmllib.py", line 390, in handle_entityref
    self.handle_data(table[name])
  File "C:\Python23\lib\htmllib.py", line 49, in handle_data
    self.savedata = self.savedata + data
UnicodeDecodeError: 'ascii' codec can't decode byte 0xa0 in position 0:
ordinal not in range(128)

The input is a html page from the web, encoded as utf8. I converted the
string via data.decode('utf8'). The result is passed to the feed function.

regards,
Achim






More information about the Python-list mailing list