[Patches] [ python-Patches-755670 ] improve HTMLParser attribute processing regexps
SourceForge.net
noreply@sourceforge.net
Mon, 16 Jun 2003 20:09:17 -0700
Patches item #755670, was opened at 2003-06-17 03:09
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470
Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Steven Rosenthal (smroid)
Assigned to: Nobody/Anonymous (nobody)
Summary: improve HTMLParser attribute processing regexps
Initial Comment:
HTML examples seen in the wild that cause parse errors
in HTMLParser include:
<a width="100%"cellspacing=0>
-- note lack of space between val and next attr name
<a foo=>
-- trailing attribute has no value after =
<a href=javascript:popup('/popup/html.html')>
-- javascript fragment with embedded quotes
My patch contains improvements to the 'attrfind' and
'locatestarttagend' regexps that allow these examples
to parse.
The existing test_htmlparser.py unit test continues to
pass, except for the one test case where it considers
<a foo=> to be an error.
I commented out that case and added new test cases to
cover the examples above.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470