[XML-SIG] losing entities when parsing then texting
Dieter Maurer
dieter at handshake.de
Fri Jul 1 20:10:11 CEST 2005
Greg Wilson wrote at 2005-6-30 12:19 -0400:
>This one must have come up several times before, but neither Google nor
>the Cookbook have given me an answer. I'm doing this:
>
>data = sys.stdin.read()
>doc = xml.dom.minidom.parseString(data)
>root = doc.documentElement
>...add and modify some nodes...
>sys.stdout.write(root.toxml('utf-8'))
>
>A typical input looks like this:
>
><?xml version="1.0" encoding="UTF-8"?>
><!DOCTYPE lec SYSTEM "swc.dtd">
><lec title="Introduction">
> <topic title="Motivation" summary="motivation for course">
> <slide>
> <b1>blah
> <b2>blah & blah</b2>
> <b2>blah&emdash;blah</b2>
> </b1>
> </slide>
> </topic>
></lec>
"Minidom"s support for entities is weak.
Try to avoid them (beside the standard XML entities) by
using the corresponding Unicode characters instead.
--
Dieter
More information about the XML-SIG
mailing list