[XML-SIG] [ pyxml-Bugs-409605 ] reader.HtmlLib ignores optional starttag
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 18 Mar 2001 15:03:08 -0800
Bugs item #409605, was updated on 2001-03-18 15:03
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=409605&group_id=6473
Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: reader.HtmlLib ignores optional starttag
Initial Comment:
Given the document
good_html = """
<html>
<P>I prefer (all things being equal)
regularity/orthogonality and logical
syntax/semantics in a language because there is less to
have to remember.
(Of course I <em>know</em> all things are NEVER really
equal!)
<P CLASS=source>Guido van Rossum, 6 Dec 91
<P>The details of that silly code are irrelevant.
<P CLASS=source>Tim Peters, 4 Mar 92
& < > é ö
</html>
"""
the reader should imply the <body> tag when it sees the
first p element. Instead, it will drop the p element,
as it is not directly allowed inside of the html
element.
Still, the document is valid, so the reader should
build the P elements into the tree. To see the error,
do
from xml.dom.ext.reader import HtmlLib
b = HtmlLib.FromHtml(good_html)
print b.firstChild.firstChild
----------------------------------------------------------------------
You can respond by visiting:
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=409605&group_id=6473