[New-bugs-announce] [issue6662] HTMLParser.HTMLParser doesn't handle malformed charrefs
Dave Day
report at bugs.python.org
Fri Aug 7 03:25:10 CEST 2009
New submission from Dave Day <dayveday at gmail.com>:
When HTMLParser.HTMLParser encounters a malformed charref (for example
&#bad;) it no longer parsers the following HTML correctly.
For example:
<p>&#bad;</p>
Recognises the starttag "p" but considers the rest to be data.
To reproduce:
class MyParser(HTMLParser.HTMLParser):
def handle_starttag(self, tag, attrs):
print 'Start "%s"' % tag
def handle_endtag(self,tag):
print 'End "%s"' % tag
def handle_charref(self, ref):
print 'Charref "%s"' % ref
def handle_data(self, data):
print 'Data "%s"' % data
parser = MyParser()
parser.feed('<p>&#bad;</p>')
parser.close()
Expected output:
Start "p"
Data "&#bad;"
End "p"
Actual output:
Start "p"
Data "&#bad;</p>"
----------
components: Library (Lib)
messages: 91392
nosy: dayveday
severity: normal
status: open
title: HTMLParser.HTMLParser doesn't handle malformed charrefs
type: behavior
versions: Python 2.4, Python 2.5
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue6662>
_______________________________________
More information about the New-bugs-announce
mailing list