[Python-Dev] HTMLParser patches
john paulson
munch@acm.org
Mon, 27 Jan 2003 14:19:24 -0800
I've submitted two patches for HTMLParser.py and
test_htmlparser.py. They were to fix two problems
lexing some html pages I found in the wild.
1. Allow "," in attributes
A page had the attribute "color=rgb(1,2,3)",
and the parser choked on the ",". Added the
"," to the list of allowed characters.
2. More robust <SCRIPT> processing.
The eBay homepage has unprotected javascript
including the line 'vb += "</SCR"+"IPT>". The
parser choked on that line. I modified the
source to accept a more robust regex for script
and style endtags. A side-effect of this is that
any "<!--" .. "-->" within a script/style will
be parsed as a comment. If that behavior is
incorrect, the regex can be modified.