[XML-SIG] How to leave character entities alone

Kevin Russell krussll@cc.UManitoba.CA
Thu, 5 Apr 2001 17:26:12 -0500 (CDT)


I'm using the DOM from PyXML 0.6.5 to manipulate documents with the Text
Encoding Initiative DTD (teixlite.dtd).  But all of the character
entities, like – and ;, vanish into thin air when read into a
DOM object. I've tried with minidom and 4DOM; I've tried with validation
under 4DOM (where it gags on the DTD itself, which will probably be the
subject of my next frantic query) and without; I've tried all the standard
readers (please tell me I don't have to build my own).  I'm obviously
thrashing around clueless.

So, using any combination of stuff in PyXML, how can I achieve *any* of
the following (in decreasing order of niceness for me):

- leave all character entity references unexpanded, sitting in the tree as
  well-behaved little EntityReference objects.

- expand them into raw text if necessary, but trick the Printer into
  turning them back into –, ;, etc., when it's time to output
  the mangled DOM back into XML.

- expand them into raw text and leave them that way.  Indeed, anything
  short of having them vanish into thin air would be tolerable.

Sorry for such a braindead question, but several wee-hours of squinting at
documentation and source-code have left me unable to see the answer that's
undoubtedly sitting out there in plain sight.

-- Kevin Russell