[XML-SIG] losing entities when parsing then texting

Dieter Maurer dieter at handshake.de
Fri Jul 1 20:10:11 CEST 2005


Greg Wilson wrote at 2005-6-30 12:19 -0400:
>This one must have come up several times before, but neither Google nor 
>the Cookbook have given me an answer.  I'm doing this:
>
>data = sys.stdin.read()
>doc = xml.dom.minidom.parseString(data)
>root = doc.documentElement
>...add and modify some nodes...
>sys.stdout.write(root.toxml('utf-8'))
>
>A typical input looks like this:
>
><?xml version="1.0" encoding="UTF-8"?>
><!DOCTYPE lec SYSTEM "swc.dtd">
><lec title="Introduction">
>   <topic title="Motivation" summary="motivation for course">
>     <slide>
>       <b1>blah
>         <b2>blah &amp; blah</b2>
>         <b2>blah&emdash;blah</b2>
>       </b1>
>     </slide>
>   </topic>
></lec>

"Minidom"s support for entities is weak.

Try to avoid them (beside the standard XML entities) by
using the corresponding Unicode characters instead.

-- 
Dieter


More information about the XML-SIG mailing list