[issue670664] HTMLParser.py - more robust SCRIPT tag parsing
Éric Araujo
report at bugs.python.org
Wed Jul 27 17:12:07 CEST 2011
Éric Araujo <merwok at netwok.org> added the comment:
Ezio wrote:
>>> myhp.feed('<script><p>foo</p></script>')
data: '<p>foo' # where's the </p>?
http://www.w3.org/TR/html4/types#type-cdata says:
Although the STYLE and SCRIPT elements use CDATA for their data
model, for these elements, CDATA must be handled differently by user
agents. Markup and entities must be treated as raw text and passed to
the application as is. The first occurrence of the character sequence
"</" (end-tag open delimiter) is treated as terminating the end of
the element's content. In valid documents, this would be the end tag
for the element.
So I think the example is invalid (should escape the <), and that HTMLParser is not buggy.
----------
versions: +Python 3.3 -Python 3.1
_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue670664>
_______________________________________
More information about the Python-bugs-list
mailing list