[Python-Dev] Grzegorz Adam Hankiewicz found a parsing bug in HTMLParser.

gminick gminick@hacker.pl
Sun, 9 Feb 2003 21:57:38 +0100


MSG-ID of the origin: <mailman.1044810540.18789.python-list@python.org>

A bit of investigation showed that the bug exists because of that line:

        <a href="http://ss"title="pe">P</a>
                         ^^^

the place in code responsible for complaining that is a method
check_for_whole_start_tag() of class HTMLParser, lines 308 to 312:

            if next in ("abcdefghijklmnopqrstuvwxyz=/"
                        "ABCDEFGHIJKLMNOPQRSTUVWXYZ"):
                # end of input in or before attribute value, or we have the
                # '/' from a '/>' ending
                return -1

I don't want to change this since I'm sure, I'll make HTMLParser weak for some
other conditions. Is there anybody who know the code for HTMLParser.py?

ps. ...if there's no one, then I'll take a deeper look on this ;)

-- 
[ ] gminick (at) underground.org.pl  http://gminick.linuxsecurity.pl/ [ ]
[ "Po prostu lubie poranna samotnosc, bo wtedy kawa smakuje najlepiej." ]