[XML-SIG] DOM: Whitespace, and subclassing
Andrew M. Kuchling
Tue, 20 Oct 1998 19:29:16 -0400 (EDT)
Two questions have come to my mind, after a discussion with Greg Ward,
a co-worker who's learning the DOM.
* First, for many (but not all) DTDs, you know that Text nodes
containing only whitespace can be ignored. For example, in:
<a> <b/> Text <c/> </a>
The whitespace between <a> and <b/>, and <c/> and </a>, may be of no
interest. In this case, you'd write a function that walked the tree
and deleted any Text nodes that contain only whitespace. (Greg points
out you should call .normalize() on the root element first, in case
the tree was constructed with only single-character Text nodes. :) )
Question 1: Does such a lose-the-whitespace method seem
generally useful enough to be added to the package?
2) If so, should it be added to the core, or put in a
3) Other whitespace normalizations are possible, such as
dropping leading and/or trailing whitespace on Text nodes, or
shortening runs of whitespace characters down to a single character.
Should these be made available? Anyone care to suggest an interface?
Second notion: subclassing core DOM classes.
Right now, there's no point in subclassing a DOM class such as
Element, because the only official way to get an Element node is to
call the .createElement() factory method on a Document object, and
.createElement() always instantiates an instance of Element. You
could never instantiate your subclass!
Question: Does subclassing basic DOM classes seem useful for
some purpose? If so, how could it be made possible?
Perhaps .create*() could take an optional klass = <class
object> argument, and verify that klass is a subclass of the original
class. However, inside core.py, nodes are often created directly,
without calling the Document object's .create*() method, in order to
save a method call, and that code would have to be changed to use the
Document object's factory functions.
A.M. Kuchling http://starship.skyport.net/crew/amk/
Only the phoenix arises and does not descend. And everything changes. And
nothing is truly lost.
-- The true end of the series, in SANDMAN #74, "The Exile"