[XML-SIG] HTML parse error
Stefan Behnel
stefan_ml at behnel.de
Mon Feb 22 15:46:27 CET 2010
sharifah ummu kulthum, 22.02.2010 14:24:
> File "grabmy.py", line 63, in get_html
> return BeautifulSoup(content)
> File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1499, in __init__
> File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1230, in __init__
> File "build/bdist.linux-i686/egg/BeautifulSoup.py", line 1263, in _feed
> File "/usr/lib/python2.6/HTMLParser.py", line 108, in feed
> self.goahead(0)
> File "/usr/lib/python2.6/HTMLParser.py", line 148, in goahead
> k = self.parse_starttag(i)
> File "/usr/lib/python2.6/HTMLParser.py", line 226, in parse_starttag
> endpos = self.check_for_whole_start_tag(i)
> File "/usr/lib/python2.6/HTMLParser.py", line 301, in
> check_for_whole_start_tag
> self.error("malformed start tag")
> File "/usr/lib/python2.6/HTMLParser.py", line 115, in error
> raise HTMLParseError(message, self.getpos())
> HTMLParser.HTMLParseError: malformed start tag, at line 830, column 36
Just noticed this now - you seem to be using BeautifulSoup, likely version
3.1. This version does not support parsing broken HTML any well, so use
version 3.0.8 instead, or switch to the tools I indicated.
Note that switching tools means that you need to change your code to use
them. Just installing them is not enough.
Stefan
More information about the XML-SIG
mailing list