[Python-Dev] Grzegorz Adam Hankiewicz found a parsing bug in HTMLParser.

Guido van Rossum guido@python.org
Sun, 09 Feb 2003 20:12:47 -0500


> MSG-ID of the origin: <mailman.1044810540.18789.python-list@python.org>

Alas, that's not help ful in tracking down the original message.

> A bit of investigation showed that the bug exists because of that line:
> 
>         <a href="http://ss"title="pe">P</a>
>                          ^^^

Which is blatantly invalid HTML, of course.

> the place in code responsible for complaining that is a method
> check_for_whole_start_tag() of class HTMLParser, lines 308 to 312:
> 
>             if next in ("abcdefghijklmnopqrstuvwxyz=/"
>                         "ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
>                 # end of input in or before attribute value, or we have the
>                 # '/' from a '/>' ending
>                 return -1
> 
> I don't want to change this since I'm sure, I'll make HTMLParser
> weak for some other conditions. Is there anybody who know the code
> for HTMLParser.py?

This isn't really the right forum for this, but I hope you can post
either a bug report or a patch to sourceforge.  If you need someone to
help investigate first, the right place to ask is comp.lang.python.

--Guido van Rossum (home page: http://www.python.org/~guido/)