[XML-SIG] DOM normalize() broken? entity refs lost?
Tue, 27 Apr 1999 17:31:55 -0400
Entity references and any other tags covered by
xml.dom.writer.Walker.doOtherNode() are thrown away when written to a file using
XmlWriter or its subclass HtmlWriter. XmlWriter does not define .doOtherNode()
so nothing gets written. I noticed it when bullets, registration marks, and
apostrophes started disappearing from my HTML files. I haven't tried to write
the code for XmlWriter.doOtherNode() yet, maybe you gurus could do it much
better than I can... :)
Last week I asked how to find simple strings in adjacent text nodes and was
advized to use Element.normalize(). I tried it and unless I'm doing it wrong,
it doesn't seem to work.
I've included a test script that demonstrates both problems:
#============== SCRIPT STARTS HERE ===========================
import sys, os
from xml.dom.utils import FileReader
from xml.dom.writer import HtmlWriter
from StringIO import StringIO
html = """
<!-- Comments blah blah blah -->
<P>Registered entity gets thrown away: ®</P>
<P>Text on multiple
lines and with extra white space in the
raw HTML doesn't change when dom.get_documentElement().normalize() is called.
fr = FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = HtmlWriter()