[XML-SIG] Removing insignificant whitespace

Brian Quinlan brian at sweetapp.com
Wed Sep 1 18:10:11 CEST 2004


Fred L. Drake, Jr. wrote:
> On Wednesday 01 September 2004 05:30 am, Brian Quinlan wrote:
>  > Yes, but whitespace-only nodes are very common in XML formatted for
>  > human consumption e.g.
> ...
>  > I count 3 whitespace-only nodes (even after normalize). Those nodes are
>  > not useful to the application some I'm wondering about the canonical
>  > way of removing them (without writing the [admittedly simple] code
> 
> Here are some approaches that can be applied generally; your application may 
> be able to use something more specific.
> 
> - Don't remove them, just ignore them.  How easy this is depends on how you 
> application processes the DOM.  getElememtsByTagName() (and the 
> namespace-aware varient) may help here.

I an doing this now but the DOM that I am working with makes this very 
annoying. There are a lot of nodes where the next sibling element is 
relevant. I have a lot of calls to _skip_ws_nodes().

> - Use a DTD so the parser can determine which whitespace exists in element 
> content so it can avoid adding them to the tree, and your initial example 
> shows you tried.  This *requires* a DTD.

It's not my XML and I don't have a DTD for it.

> - Use a node filter that discards Text nodes in element content.  This 
> requires that your filter knows enough about the document type you're 
> expecting that it can identify whitespace in element content.

I'll look into that.

> There are probably other approaches as well.

So StripXml is not expected to work in this context? The problem seems 
to be that StripXml expects documents to have a createNodeIterator 
method and DOMBuilder is not creating a DOM that offers that method. 
Not sure why this is...

Cheers,
Brian


More information about the XML-SIG mailing list