[XML-SIG] Eliminating whitespace
Andrew M. Kuchling
akuchlin@cnri.reston.va.us
Fri, 4 Dec 1998 16:55:57 -0500 (EST)
A common task when processing a document using the DOM is to strip out
unnecessary whitespace. I'd definitely like to have a function or set
of functions to do this, and would like to discuss what the interface
should look like.
The problem: given a DOM tree, you want to remove whitespace from it.
There are several dimensions to the problem:
* Delete whitespace, or collapse it down to a single space?
* Just act on Text nodes that are all whitespace? Or act on
Text nodes with leading, trailing, or internal whitespace? (If acting
on internal whitespace, you'll probably be collapsing down to a single
space, not deleting everything. Though who knows?)
Anyway, I don't think there's any call for making elaborate
whitespace-deleting classes that can be customized in various ways.
So, how about a function (or method on dom.core.Node?). Strawman
interface:
normalize_whitespace( DOMtree,
collapse = [true | false] default false,
inside_node = [true | false] default false,
where = LEFT, RIGHT, INSIDE, or a bitwise OR of these flags
Default = all of them
)
Examples:
normalize_whitespace( DOMtree ) Drop all whitespace-only nodes
normalize_whitespace( DOMtree, 1, 1 ) Collapse all runs of
whitespace down to single spaces
normalize_whitespace( DOMtree, 1, 1, LEFT | RIGHT )
Strip trailing and leading whitespace from all Text nodes
I have a sneaking feeling that there's one argument too many in that
function, and it could be made more compact somehow, but can't think
of anything definite. Anyone got suggestions? (Where's Tim Peters
when you need him?)
--
A.M. Kuchling http://starship.skyport.net/crew/amk/
"I'll be curious to see what he thinks Hell is."
"Garn, I hope he ain't British. Some of that stuff them people dream up...
it's enough to gag a maggot."
-- Demons awaiting Stanley's arrival in Hell in STANLEY AND HIS MONSTER #4