[New-bugs-announce] [issue26084] HTMLParser mishandles last attribute in self-closing tag

Tom Anderl report at bugs.python.org
Mon Jan 11 15:48:45 EST 2016


New submission from Tom Anderl:

When the HTMLParser encounters a start tag element that includes:
  1. an unquoted attribute as the final attribute 
  2. an optional '/' character marking the start tag as self-closing
  3. no space between the final attribute and the '/' character

the '/' character gets attached to the attribute value and the element is interpreted as not self-closing.  This can be illustrated with the following:

===============================================================================

import HTMLParser

# Begin Monkeypatch
#import re
#HTMLParser.attrfind = re.compile(
#    r'((?<=[\'"\s/])[^\s/>][^\s/=>]*)(\s*=+\s*'
#    r'(\'[^\']*\'|"[^"]*"|(?![\'"])[^/>\s]*))?(?:\s|/(?!>))*')
# End Monkeypatch

class MyHTMLParser(HTMLParser.HTMLParser):
    def handle_starttag(self, tag, attrs):
        print('got starttag: {0} with attributes {1}'.format(tag, attrs))

    def handle_endtag(self, tag):
        print('got endtag: {0}'.format(tag))

MyHTMLParser().feed('<img height=1.0 width=2.0/>')

==============================================================================

Running the above code yields the output:

    got starttag: img with attributes [('height', '1.0'), ('width', '2.0/')]

Note the trailing '/' on the 'width' attribute.  If I uncomment the monkey patch, the script then yields:

    got starttag: img with attributes [('height', '1.0'), ('width', '2.0')]
    got endtag: img

Note that the trailing '/' is gone, and an endtag event was generated.

----------
components: Library (Lib)
messages: 258013
nosy: Tom Anderl
priority: normal
severity: normal
status: open
title: HTMLParser mishandles last attribute in self-closing tag
type: behavior
versions: Python 2.7

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue26084>
_______________________________________


More information about the New-bugs-announce mailing list