[XML-SIG] DOM normalize() broken? entity refs lost?

Jeff.Johnson@icn.siemens.com Jeff.Johnson@icn.siemens.com
Tue, 27 Apr 1999 17:31:55 -0400


Entity references and any other tags covered by
xml.dom.writer.Walker.doOtherNode() are thrown away when written to a file using
XmlWriter or its subclass HtmlWriter.  XmlWriter does not define .doOtherNode()
so nothing gets written.  I noticed it when bullets, registration marks, and
apostrophes started disappearing from my HTML files.  I haven't tried to write
the code for XmlWriter.doOtherNode() yet, maybe you gurus could do it much
better than I can... :)

Last week I asked how to find simple strings in adjacent text nodes and was
advized to use Element.normalize().  I tried it and unless I'm doing it wrong,
it doesn't seem to work.

I've included a test script that demonstrates both problems:

#============== SCRIPT STARTS HERE ===========================
import sys, os
from xml.dom.utils import FileReader
from xml.dom.writer import HtmlWriter
from StringIO import StringIO

html = """
<HTML>
<!-- Comments blah blah blah -->
<HEAD>
<TITLE>test</TITLE>
</HEAD>
<BODY >
<P>Registered entity gets thrown away: &reg;</P>
<P>Text on multiple
lines and with extra white         space in the
raw HTML doesn't change when dom.get_documentElement().normalize() is called.
</P>
</BODY>
</HTML>
"""

fr = FileReader()
dom = fr.readStream(StringIO(html),'HTML')
dom.get_documentElement().normalize()
w = HtmlWriter()
w.write(dom)