[Patches] [ python-Patches-670664 ] HTMLParser.py - more robust <SCRIPT> parsing

SourceForge.net noreply@sourceforge.net
Tue, 28 Jan 2003 14:35:00 -0800


Patches item #670664, was opened at 2003-01-19 14:07
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=670664&group_id=5470

Category: Library (Lib)
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: j paulson (fantoozler)
Assigned to: Fred L. Drake, Jr. (fdrake)
Summary: HTMLParser.py - more robust <SCRIPT> parsing

Initial Comment:
http://www.ebay.com contains a script element of the form

<SCRIPT>
...
   vbscript += "</SCR"+"IPT> \n";
...
</SCRIPT>

which is not enclosed in "<!-- ... -->" comments.  The parser 
choked on that line, indicating it was a mal-formed end tag.

The changes are:

  interesting_cdata is now a dict mapping start tag to
    an re matching the end tag, a "<--" or \Z

  HTMLParser.set_cdata_mode takes an extra argument, 
    the start tag


----------------------------------------------------------------------

>Comment By: j paulson (fantoozler)
Date: 2003-01-28 22:35

Message:
Logged In: YES 
user_id=690612

You will get a sequence of:
  handle_starttag("script")
  handle_comment("some-javascript-code")
  handle_endtag("script")

whereas before the sequence was:
  handle_starttag("script")
  handle_data("<!-- ... some-javascript-code ... //-->")
  handle_endtag("script")

----------------------------------------------------------------------

Comment By: Fred L. Drake, Jr. (fdrake)
Date: 2003-01-28 22:24

Message:
Logged In: YES 
user_id=3066

>From python-dev:

John Paulson wrote:
>     [...]  A side-effect of this is that
>     any "<!--" .. "-->" within a script/style will
>     be parsed as a comment.  If that behavior is
>     incorrect, the regex can be modified.

Jerry Williams wrote:
Does this mean that the following won't work:

  <SCRIPT language="JavaScript">
    <!-- //
    some-javascript-code
    // -->
  </SCRIPT>

That could be a problem, since this is commonly used
to support browsers that don't understand <SCRIPT>.

See:
http://mail.python.org/pipermail/python-dev/2003-January/032482.html

----------------------------------------------------------------------

Comment By: j paulson (fantoozler)
Date: 2003-01-25 03:58

Message:
Logged In: YES 
user_id=690612

Found regression test, used it, found error, fixed it.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=305470&aid=670664&group_id=5470