[issue13273] HTMLParser improperly handling open tags when strict is False

Christopher Allen-Poole report at bugs.python.org
Thu Oct 27 09:56:02 CEST 2011


New submission from Christopher Allen-Poole <christopherw at allen-poole.com>:

This is is encountered when extending html.parser.HTMLParser and running with strict mode False.

Expected behavior:
When '''<div style=""    ><b>The <a href="some_url">rain</a> <br /> in <span>Spain</span></b></div>''' is passed to the feed method, div, b, a, br, and span should all be passed to the handle_starttag method.

Actual behavior
The handle_data method receives the values <div style=""    >,<b>,<a href="some_url">,<br />,<span> in addition to the regular text.

This can be fixed by changing this (inside the parse_starttag method):

m = hparse.attrfind_tolerant.search(rawdata, k)

to

m = hparse.attrfind_tolerant.match(rawdata, k)

----------
components: Library (Lib)
messages: 146479
nosy: Christopher.Allen-Poole
priority: normal
severity: normal
status: open
title: HTMLParser improperly handling open tags when strict is False
type: behavior
versions: Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue13273>
_______________________________________


More information about the Python-bugs-list mailing list