[Patches] [ python-Patches-755670 ] improve HTMLParser attribute processing regexps

SourceForge.net noreply@sourceforge.net
Mon, 16 Jun 2003 20:10:57 -0700


Patches item #755670, was opened at 2003-06-17 03:09
Message generated for change (Comment added) made by smroid
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Steven Rosenthal (smroid)
Assigned to: Nobody/Anonymous (nobody)
Summary: improve HTMLParser attribute processing regexps

Initial Comment:
HTML examples seen in the wild that cause parse errors
in HTMLParser include:

<a width="100%"cellspacing=0>
  -- note lack of space between val and next attr name

<a foo=>
  -- trailing attribute has no value after =

<a href=javascript:popup('/popup/html.html')>
  -- javascript fragment with embedded quotes

My patch contains improvements to the 'attrfind' and
'locatestarttagend' regexps that allow these examples
to parse.

The existing test_htmlparser.py unit test continues to
pass, except for the one test case where it considers
<a foo=> to be an error.

I commented out that case and added new test cases to
cover the examples above.


----------------------------------------------------------------------

>Comment By: Steven Rosenthal (smroid)
Date: 2003-06-17 03:10

Message:
Logged In: YES 
user_id=159908

Base version for HTMLParser.py is 1.11.2.1; base version for
test_htmlparser.py is 1.8.8.1


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=755670&group_id=5470