[XML-SIG] unicode entitie refs

Jeff.Johnson@icn.siemens.com Jeff.Johnson@icn.siemens.com
Fri, 30 Apr 1999 10:13:16 -0400

Sorry to be a pest but I never got a response on the following email and was
hoping someone had an answer as to why unicode entity refs dissapear in PyDom.

After I write this I'll start looking at the SAX code, maybe I have to install
error handlers?  Any suggestions?


---------------------- Forwarded by Jeff Johnson/Service/ICN on 04/30/99 10:07
AM ---------------------------

Jeff Johnson
04/28/99 01:21 PM

To:   akuchlin@cnri.reston.va.us
cc:   xml-sig@python.org
Subject:  Re: [XML-SIG] DOM normalize() broken? entity refs lost?  (Document
      link not converted)

Thanks for the entity reference fix Andrew.  It now saves "®" but it still
loses things like "’".  I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "’", I'm
curious where the entity ref is going.  I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called.  I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.

A new script is included:

import sys, os
from StringIO import StringIO

from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter

html = """
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered &reg;</P>

fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()