[XML-SIG] Removing insignificant whitespace
Fred L. Drake, Jr.
fdrake at acm.org
Wed Sep 1 15:26:53 CEST 2004
On Wednesday 01 September 2004 05:30 am, Brian Quinlan wrote:
> Yes, but whitespace-only nodes are very common in XML formatted for
> human consumption e.g.
> I count 3 whitespace-only nodes (even after normalize). Those nodes are
> not useful to the application some I'm wondering about the canonical
> way of removing them (without writing the [admittedly simple] code
Here are some approaches that can be applied generally; your application may
be able to use something more specific.
- Don't remove them, just ignore them. How easy this is depends on how you
application processes the DOM. getElememtsByTagName() (and the
namespace-aware varient) may help here.
- Use a DTD so the parser can determine which whitespace exists in element
content so it can avoid adding them to the tree, and your initial example
shows you tried. This *requires* a DTD.
- Use a node filter that discards Text nodes in element content. This
requires that your filter knows enough about the document type you're
expecting that it can identify whitespace in element content.
There are probably other approaches as well.
Fred L. Drake, Jr. <fdrake at acm.org>
More information about the XML-SIG