[XML-SIG] losing entities when parsing then texting
Andrew Clover
and-xml at doxdesk.com
Wed Jul 6 15:19:56 CEST 2005
Greg Wilson <gvwilson at cs.utoronto.ca> wrote:
> I realize I should include the Unicode characters directly in my files,
> but that's not possible in this case---I have to accommodate people who
> are using editors that only handle 7-bit ASCII.
Theoretically, .toxml('us-ascii') should generate usable output.
Unfortunately minidom doesn't really do this properly and you'll get a
UnicodeError.
As a workaround you could just take the UTF-8 encoded version and
.encode('us-ascii', 'xmlcharrefreplace') on it... which is technically
the wrong thing if nodeNames or CDATASections or whatever have non-ASCII
characters in, but that probably doesn't matter to you.
ObStandardPlug: pxdom supports both proper charref-escaping (using
DOM3LS DOMOutput.encoding) and keeping EntityReference nodes (using
DOM3Core DOMConfiguration.setParameter('entities', True) or
pxdom.parse(file, {'entities': True}).)
--
Andrew Clover
mailto:and at doxdesk.com
http://www.doxdesk.com/
More information about the XML-SIG
mailing list