[XML-SIG] unicode entitie refs

Fri, 30 Apr 1999 10:13:16 -0400

Sorry to be a pest but I never got a response on the following email and was
hoping someone had an answer as to why unicode entity refs dissapear in PyDom.

After I write this I'll start looking at the SAX code, maybe I have to install
error handlers?  Any suggestions?

Thanks,
Jeff

---------------------- Forwarded by Jeff Johnson/Service/ICN on 04/30/99 10:07
AM ---------------------------

Jeff Johnson
04/28/99 01:21 PM

To:   akuchlin@cnri.reston.va.us
cc:   xml-sig@python.org
Subject:  Re: [XML-SIG] DOM normalize() broken? entity refs lost?  (Document
      link not converted)

Thanks for the entity reference fix Andrew.  It now saves "&reg;" but it still
loses things like "&#8217;".  I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "&#8217;", I'm
curious where the entity ref is going.  I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called.  I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.

A new script is included:

import sys, os
from StringIO import StringIO

from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter

html = """
<P>Don&#8217;t</P>
"""
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered &reg;</P>

fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()
w.write(dom)