[XML-SIG] Removing insignificant whitespace
Brian Quinlan
brian at sweetapp.com
Wed Sep 1 18:10:11 CEST 2004
Fred L. Drake, Jr. wrote:
> On Wednesday 01 September 2004 05:30 am, Brian Quinlan wrote:
> > Yes, but whitespace-only nodes are very common in XML formatted for
> > human consumption e.g.
> ...
> > I count 3 whitespace-only nodes (even after normalize). Those nodes are
> > not useful to the application some I'm wondering about the canonical
> > way of removing them (without writing the [admittedly simple] code
>
> Here are some approaches that can be applied generally; your application may
> be able to use something more specific.
>
> - Don't remove them, just ignore them. How easy this is depends on how you
> application processes the DOM. getElememtsByTagName() (and the
> namespace-aware varient) may help here.
I an doing this now but the DOM that I am working with makes this very
annoying. There are a lot of nodes where the next sibling element is
relevant. I have a lot of calls to _skip_ws_nodes().
> - Use a DTD so the parser can determine which whitespace exists in element
> content so it can avoid adding them to the tree, and your initial example
> shows you tried. This *requires* a DTD.
It's not my XML and I don't have a DTD for it.
> - Use a node filter that discards Text nodes in element content. This
> requires that your filter knows enough about the document type you're
> expecting that it can identify whitespace in element content.
I'll look into that.
> There are probably other approaches as well.
So StripXml is not expected to work in this context? The problem seems
to be that StripXml expects documents to have a createNodeIterator
method and DOMBuilder is not creating a DOM that offers that method.
Not sure why this is...
Cheers,
Brian
More information about the XML-SIG
mailing list