[Python-Dev] Grzegorz Adam Hankiewicz found a parsing bug in
HTMLParser.
Guido van Rossum
guido@python.org
Sun, 09 Feb 2003 20:12:47 -0500
> MSG-ID of the origin: <mailman.1044810540.18789.python-list@python.org>
Alas, that's not help ful in tracking down the original message.
> A bit of investigation showed that the bug exists because of that line:
>
> <a href="http://ss"title="pe">P</a>
> ^^^
Which is blatantly invalid HTML, of course.
> the place in code responsible for complaining that is a method
> check_for_whole_start_tag() of class HTMLParser, lines 308 to 312:
>
> if next in ("abcdefghijklmnopqrstuvwxyz=/"
> "ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
> # end of input in or before attribute value, or we have the
> # '/' from a '/>' ending
> return -1
>
> I don't want to change this since I'm sure, I'll make HTMLParser
> weak for some other conditions. Is there anybody who know the code
> for HTMLParser.py?
This isn't really the right forum for this, but I hope you can post
either a bug report or a patch to sourceforge. If you need someone to
help investigate first, the right place to ask is comp.lang.python.
--Guido van Rossum (home page: http://www.python.org/~guido/)