[XML-SIG] Unwanted behavior in PrettyPrint: > doesn't round-trip

rmunn at pobox.com rmunn at pobox.com
Tue Jul 6 15:15:44 CEST 2004


I'm trying to use xml.dom.ext.PrettyPrint to pretty-print some XML data
to a file, and discovering that it doesn't quite do what I want. Here's
an example:

Python 2.3.4 (#1, Jun  5 2004, 10:44:08) 
[GCC 3.3.3 20040412 (Gentoo Linux 3.3.3-r5, ssp-3.3-7, pie-8.7.5.3)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> from xml.dom import minidom
>>> from xml.dom.ext import PrettyPrint
>>> doc = minidom.parseString('<description>This contains a nested &lt;b&gt; tag</description>')
>>> doc
<xml.dom.minidom.Document instance at 0x403b8a8c>
>>> PrettyPrint(doc)
<?xml version='1.0' encoding='UTF-8'?>
<description>This contains a nested &lt;b> tag</description>
>>> 

I'd prefer the output to be:
"""<?xml version='1.0' encoding='UTF-8'?>
<description>This contains a nested &lt;b&gt; tag</description>
"""

This XML data is eventually going to be going into an HTML page and sent
to the user's browser. Since the > character doesn't close any tags,
most browsers will probably display it. But with the vast number of
different browsers out there, with slightly different behavior, I'd
rather not rely on "probably". :-( I'd prefer for the &gt; entity to
make it through a round trip (parse to print) untouched.

Is there any way for me to tell PrettyPrint not to dereference character
entities?

-- 
Robin Munn
rmunn at pobox.com
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: not available
Url : http://mail.python.org/pipermail/xml-sig/attachments/20040706/7fc3bf52/attachment.pgp


More information about the XML-SIG mailing list