[XML-SIG] unicode entitie refs
Jeff.Johnson@icn.siemens.com
Jeff.Johnson@icn.siemens.com
Fri, 30 Apr 1999 10:13:16 -0400
Sorry to be a pest but I never got a response on the following email and was
hoping someone had an answer as to why unicode entity refs dissapear in PyDom.
After I write this I'll start looking at the SAX code, maybe I have to install
error handlers? Any suggestions?
Thanks,
Jeff
---------------------- Forwarded by Jeff Johnson/Service/ICN on 04/30/99 10:07
AM ---------------------------
Jeff Johnson
04/28/99 01:21 PM
To: akuchlin@cnri.reston.va.us
cc: xml-sig@python.org
Subject: Re: [XML-SIG] DOM normalize() broken? entity refs lost? (Document
link not converted)
Thanks for the entity reference fix Andrew. It now saves "®" but it still
loses things like "’". I think this is Unicode generated from the RTF to
HTML filter I'm using, and while I can change the RTF to HTML character
translation table to convert RTF "quoteright" to "'" instead of "’", I'm
curious where the entity ref is going. I put some debug statements in
HtmlBuilder.handle_entityref() but it never gets called. I know there is
controversy over Unicode support but I don't know enough about it to know what
to expect in this case.
A new script is included:
import sys, os
from StringIO import StringIO
from xml.dom import utils
from xml.dom.writer import HtmlWriter, XmlWriter
html = """
<P>Don’t</P>
"""
# This works with Andrew's patch but the unicode single quote still vanishes
without a trace.
#<P>Registered ®</P>
fr = utils.FileReader()
dom = fr.readStream(StringIO(html),'HTML')
w = XmlWriter()
w.write(dom)