[XML-SIG] Eliminating whitespace

Andrew M. Kuchling akuchlin@cnri.reston.va.us
Fri, 4 Dec 1998 16:55:57 -0500 (EST)


A common task when processing a document using the DOM is to strip out
unnecessary whitespace.  I'd definitely like to have a function or set
of functions to do this, and would like to discuss what the interface
should look like.

The problem: given a DOM tree, you want to remove whitespace from it.
There are several dimensions to the problem:

	* Delete whitespace, or collapse it down to a single space?

	* Just act on Text nodes that are all whitespace?  Or act on
Text nodes with leading, trailing, or internal whitespace?  (If acting
on internal whitespace, you'll probably be collapsing down to a single
space, not deleting everything.  Though who knows?)

	Anyway, I don't think there's any call for making elaborate
whitespace-deleting classes that can be customized in various ways.
So, how about a function (or method on dom.core.Node?).  Strawman
interface:

normalize_whitespace( DOMtree, 
      collapse = [true | false] default false,
      inside_node = [true | false] default false,
      where = LEFT, RIGHT, INSIDE, or a bitwise OR of these flags
	      Default = all of them
)	

Examples:

normalize_whitespace( DOMtree )   Drop all whitespace-only nodes
    
normalize_whitespace( DOMtree, 1, 1 )   Collapse all runs of
					whitespace down to single spaces

normalize_whitespace( DOMtree, 1, 1, LEFT | RIGHT ) 
	Strip trailing and leading whitespace from all Text nodes

I have a sneaking feeling that there's one argument too many in that
function, and it could be made more compact somehow, but can't think
of anything definite.  Anyone got suggestions?  (Where's Tim Peters
when you need him?)

-- 
A.M. Kuchling			http://starship.skyport.net/crew/amk/
    "I'll be curious to see what he thinks Hell is."
    "Garn, I hope he ain't British. Some of that stuff them people dream up...
it's enough to gag a maggot."
    -- Demons awaiting Stanley's arrival in Hell in STANLEY AND HIS MONSTER #4