[XML-SIG] xml.dom.ext.reader.HtmlLib

Alexandre Fayolle Alexandre.Fayolle@logilab.fr
Tue, 17 Jul 2001 11:02:03 +0200 (CEST)


Hello,

I was hunting for a bug in Narval, and ended up in
xml.dom.ext.reader.HtmlLib. I would like some feedback on this to know
is this is indeed a bug, a documentation issue, or just me daydreaming
that all APIs should do what I'd like them to, instead of what the coder
meant.

When I use xml.dom.ext.reader.Sax2, if I pass an ownerDocument to the
reader when reading the data, I'll get back a DocumentFragment, belonging
to the same document. 

With HtmlLib's reader, this is not the case : the owner document I'm
passing is getting emptied. Cf. line 42-46:
        if doc:
            while doc.firstChild:
                # Empty out the document
                node = doc.removeChild(doc.firstChild)
                ReleaseNode(node)

First (minor) thing is, this supposes I'm using a 4DOM document, since it
uses ReleaseNode, second (important) thing is, I'm much annoyed that the
document should be emptied, since in the case at hand, it already had some
contents, and I was merely passing it in order to be sure that the right
DOM implementation would be used, and to avoid an expensive call to
importNode.

As a side note, Sgmlop.HtmlParser uses non NS methods to build it's
DOM. Is this what is intended ?

I'll be glad to work on some patches, hopefully in time for PyXML 0.6.6,
once the correct behaviour has been agreed on.

Cheers,

Alexandre Fayolle
-- 
LOGILAB, Paris (France).
http://www.logilab.com   http://www.logilab.fr  http://www.logilab.org
Narval, the first software agent available as free software (GPL).