[Patches] [ python-Patches-755670 ] improve HTMLParser attribute processing regexps
SourceForge.net
noreply@sourceforge.net
Mon, 16 Jun 2003 20:10:57 -0700
Patches item #755670, was opened at 2003-06-17 03:09
Message generated for change (Comment added) made by smroid
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470
Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Steven Rosenthal (smroid)
Assigned to: Nobody/Anonymous (nobody)
Summary: improve HTMLParser attribute processing regexps
Initial Comment:
HTML examples seen in the wild that cause parse errors
in HTMLParser include:
<a width="100%"cellspacing=0>
-- note lack of space between val and next attr name
<a foo=>
-- trailing attribute has no value after =
<a href=javascript:popup('/popup/html.html')>
-- javascript fragment with embedded quotes
My patch contains improvements to the 'attrfind' and
'locatestarttagend' regexps that allow these examples
to parse.
The existing test_htmlparser.py unit test continues to
pass, except for the one test case where it considers
<a foo=> to be an error.
I commented out that case and added new test cases to
cover the examples above.
----------------------------------------------------------------------
>Comment By: Steven Rosenthal (smroid)
Date: 2003-06-17 03:10
Message:
Logged In: YES
user_id=159908
Base version for HTMLParser.py is 1.11.2.1; base version for
test_htmlparser.py is 1.8.8.1
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470