[XML-SIG] Removing insignificant whitespace

Brian Quinlan brian at sweetapp.com
Thu Sep 2 13:33:02 CEST 2004


Bob Kline wrote:
> If you don't have a DTD (or the functional equivalent), then you're out 
> of luck, because in that case the machine doesn't having any way of 
> knowing what you mean by "insignificant whitespace."  You don't want the 
> software to assume that every text node which contains only whitespace 
> is insignificant, even if you have "normalized" the document to collapse 
> adjacent text nodes into one.  

If you reread my original post, you'll see that I am not arguing for 
different default behavior. I was asking if there was some way of 
removing all whitespace-only text nodes. StripXml() claims to do what I 
want but it doesn't work with the DOM created by DOMBuilder (this seems 
like a bug or misdesign).

> If you *know* that the documents will never contain such inline markup
> (because, for example, you've had a peek at the elusive DTD, and have
> been assured that it won't change), then you can write software to take
> advantage of this special knowledge.

How could I correctly manipulate the DOM without making assumptions 
about it's semantics i.e. if whitespace-only nodes were to suddenly 
become significant, how could I correctly process them without some 
knowledge of their meaning?

> Probably the most straightforward
> approach would be an XSLT script with a template that strips whitespace
> text nodes and another template which passes everything else through
> unscathed.

I just wrote a trivial little function to do this. The cookbook recipe 
is here: http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/303061

Cheers,
Brian



More information about the XML-SIG mailing list