[XML-SIG] How to leave character entities alone
Kevin Russell
krussll@cc.UManitoba.CA
Thu, 5 Apr 2001 17:26:12 -0500 (CDT)
I'm using the DOM from PyXML 0.6.5 to manipulate documents with the Text
Encoding Initiative DTD (teixlite.dtd). But all of the character
entities, like – and ;, vanish into thin air when read into a
DOM object. I've tried with minidom and 4DOM; I've tried with validation
under 4DOM (where it gags on the DTD itself, which will probably be the
subject of my next frantic query) and without; I've tried all the standard
readers (please tell me I don't have to build my own). I'm obviously
thrashing around clueless.
So, using any combination of stuff in PyXML, how can I achieve *any* of
the following (in decreasing order of niceness for me):
- leave all character entity references unexpanded, sitting in the tree as
well-behaved little EntityReference objects.
- expand them into raw text if necessary, but trick the Printer into
turning them back into –, ;, etc., when it's time to output
the mangled DOM back into XML.
- expand them into raw text and leave them that way. Indeed, anything
short of having them vanish into thin air would be tolerable.
Sorry for such a braindead question, but several wee-hours of squinting at
documentation and source-code have left me unable to see the answer that's
undoubtedly sitting out there in plain sight.
-- Kevin Russell