[Patches] [ python-Patches-755670 ] improve HTMLParser attribute processing regexps

SourceForge.net noreply@sourceforge.net
Mon, 16 Jun 2003 20:09:17 -0700


Patches item #755670, was opened at 2003-06-17 03:09
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Steven Rosenthal (smroid)
Assigned to: Nobody/Anonymous (nobody)
Summary: improve HTMLParser attribute processing regexps

Initial Comment:
HTML examples seen in the wild that cause parse errors
in HTMLParser include:

<a width="100%"cellspacing=0>
  -- note lack of space between val and next attr name

<a foo=>
  -- trailing attribute has no value after =

<a href=javascript:popup('/popup/html.html')>
  -- javascript fragment with embedded quotes

My patch contains improvements to the 'attrfind' and
'locatestarttagend' regexps that allow these examples
to parse.

The existing test_htmlparser.py unit test continues to
pass, except for the one test case where it considers
<a foo=> to be an error.

I commented out that case and added new test cases to
cover the examples above.


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470