[XML-SIG] Removing insignificant whitespace
brian at sweetapp.com
Thu Sep 2 13:33:02 CEST 2004
Bob Kline wrote:
> If you don't have a DTD (or the functional equivalent), then you're out
> of luck, because in that case the machine doesn't having any way of
> knowing what you mean by "insignificant whitespace." You don't want the
> software to assume that every text node which contains only whitespace
> is insignificant, even if you have "normalized" the document to collapse
> adjacent text nodes into one.
If you reread my original post, you'll see that I am not arguing for
different default behavior. I was asking if there was some way of
removing all whitespace-only text nodes. StripXml() claims to do what I
want but it doesn't work with the DOM created by DOMBuilder (this seems
like a bug or misdesign).
> If you *know* that the documents will never contain such inline markup
> (because, for example, you've had a peek at the elusive DTD, and have
> been assured that it won't change), then you can write software to take
> advantage of this special knowledge.
How could I correctly manipulate the DOM without making assumptions
about it's semantics i.e. if whitespace-only nodes were to suddenly
become significant, how could I correctly process them without some
knowledge of their meaning?
> Probably the most straightforward
> approach would be an XSLT script with a template that strips whitespace
> text nodes and another template which passes everything else through
I just wrote a trivial little function to do this. The cookbook recipe
is here: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303061
More information about the XML-SIG