[XML-SIG] [ pyxml-Bugs-409605 ] reader.HtmlLib ignores optional starttag

noreply@sourceforge.net noreply@sourceforge.net
Sun, 18 Mar 2001 15:03:08 -0800


Bugs item #409605, was updated on 2001-03-18 15:03
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=409605&group_id=6473

Category: None
Group: None
Status: Open
Priority: 5
Submitted By: Martin v. Löwis (loewis)
Assigned to: Nobody/Anonymous (nobody)
Summary: reader.HtmlLib ignores optional starttag

Initial Comment:
Given the document

good_html = """
<html>
<P>I prefer (all things being equal)
regularity/orthogonality and logical
syntax/semantics in a language because there is less to
have to remember.
(Of course I <em>know</em> all things are NEVER really
equal!)
<P CLASS=source>Guido van Rossum, 6 Dec 91
<P>The details of that silly code are irrelevant.
<P CLASS=source>Tim Peters, 4 Mar 92
&amp; &lt; &gt; &eacute; &ouml; &nbsp;
</html>
"""

the reader should imply the <body> tag when it sees the
first p element. Instead, it will drop the p element,
as it is not directly allowed inside of the html
element.

Still, the document is valid, so the reader should
build the P elements into the tree. To see the error,
do

from xml.dom.ext.reader import HtmlLib
b = HtmlLib.FromHtml(good_html) 
print b.firstChild.firstChild

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=106473&aid=409605&group_id=6473