[ python-Bugs-1752919 ] Exception in HTMLParser for special JavaScript code

SourceForge.net noreply at sourceforge.net
Thu Jul 12 21:28:02 CEST 2007


Bugs item #1752919, was opened at 2007-07-12 22:28
Message generated for change (Tracker Item Submitted) made by Item Submitter
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1752919&group_id=5470

Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.4
Status: Open
Resolution: None
Priority: 5
Private: No
Submitted By: Eugine Kosenko (eugine_kosenko)
Assigned to: Nobody/Anonymous (nobody)
Summary: Exception in HTMLParser for special JavaScript code

Initial Comment:
import HTMLParser

p = HTMLParser.HTMLParser()
p.feed("""
<script>
<!--
bmD.write('</sc'+'ript>');
//-->
</script>
""")

Traceback (most recent call last):
  File "<stdin>", line 4, in ?
  File "/usr/lib/python2.4/HTMLParser.py", line 108, in feed
    self.goahead(0)
  File "/usr/lib/python2.4/HTMLParser.py", line 150, in goahead
    k = self.parse_endtag(i)
  File "/usr/lib/python2.4/HTMLParser.py", line 314, in parse_endtag
    self.error("bad end tag: %r" % (rawdata[i:j],))
  File "/usr/lib/python2.4/HTMLParser.py", line 115, in error
    raise HTMLParseError(message, self.getpos())
HTMLParser.HTMLParseError: bad end tag: "</sc'+'ript>", at line 4, column 12

The JavaScript code is protected via HTML comment, so HTMLParser must skip it entirely, and the parsing must be successfull.

Instead of this, the JavaScript code is parsed as a part of the HTML page, and incorrect end tag is detected. If one move the actual end tag </script> up just after start tag <script>, the code is parsed without errors.

Hence the code seems to be artificial, it is used often in real site counters to prevent the blocking of them.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1752919&group_id=5470


More information about the Python-bugs-list mailing list