Help w/ HTMLParser lib
Kevin T. Ryan
ktr46 at hotmail.com
Fri May 21 10:51:38 EDT 2004
Thanks to both of you - I will try to incorporate the regex's and I'll check
out tidy. Take care,
Kevin
"Kevin T. Ryan" <kevryan0701 at yahoo.com> wrote in message
news:40ad7619$0$3114$61fed72c at news.rcn.com...
> Hi all -
>
> I'm somewhat new to python (about 1 year), and I'm trying to write a
program
> that opens a file like object w/ urllib.urlopen, and then parse the data
by
> passing it to a class that subclasses HTMLParser.HTMLParser. On the web
> page, however, there is javascript - and I think that is causing an error
> in parsing the data. Here's the error:
>
> Traceback (most recent call last):
> File "<stdin>", line 1, in ?
> File "html_helper.py", line 30, in parse_data
> p.feed(data)
> File "//usr/lib/python2.2/HTMLParser.py", line 108, in feed
> self.goahead(0)
> File "//usr/lib/python2.2/HTMLParser.py", line 150, in goahead
> k = self.parse_endtag(i)
> File "//usr/lib/python2.2/HTMLParser.py", line 329, in parse_endtag
> self.error("bad end tag: %s" % `rawdata[i:j]`)
> File "//usr/lib/python2.2/HTMLParser.py", line 115, in error
> raise HTMLParseError(message, self.getpos())
> HTMLParser.HTMLParseError: bad end tag: "</scr' + 'ipt>", at line 411,
> column 7
>
> I've tried to use a try/except clause both w/in my class and w/in a
function
> that wraps the class for easy access, but to no avail. The code works on
> other websites, so I know that it's not *completely* off. Any help would
> be greatly appreciated! TIA :)
>
> Kevin
More information about the Python-list
mailing list