[XML-SIG] A Few Bugs in dom/transformer.py

Jeff Rush Jeff Rush" <jrush@summit-research.com
Sat, 19 Jun 99 04:45:03 -0500


I've checked the XML-SIG mailing list archives and the latest
CVS for updates to dom/transformer.py but didn't see any.
Hence...

Bug #1:
    Throughout the dom/transformer.py, reference is made to
    'NodeType' but the correct name is 'nodeType'.

Bug #2:
    While trying to create a subclass of Transformer, in order to
     strip out HTML formatting/graphics tags, I hit a problem where
     v0.5.1 of Transformer won't modify the DOM tree it walks.

    ----- old code -----
	new_children = []
	for child in node.getChildren():
		new_children = new_children + self._transform_node(child)
	node._children = new_children
    ----- old code -----

    Nodes don't have a '_children' attribute and besides, this doesn't
    update the node's parentdict, hence any changes are not seen
    by the higher DOM tree levels.

    ----- new code ------
	new_children = []
	for child in node.childNodes:
		new_children = new_children + self._transform_node(child)

	for child in node.childNodes[:] : # Remove Old Children
		node.removeChild(child)

	for child in new_children:    # And Replace with (0 or more) New
		node.appendChild(child)
    ----- new code -----

Suggestion #1:
     Define a __call__ method in the Transformer class that
     calls the existing transform method, so the following works:

	class FormatStripper(Transformer):
		....
	strip_formatting = FormatStripper()

	strip_formatting(doc)

I can now write my stripping transformers as:

---------- cut here ----------
class FormatStripper(xml.dom.transformer.Transformer):
	def do_FONT(self, node):	return node.childNodes
	def do_B(self, node):		return node.childNodes
	def do_I(self, node):		return node.childNodes

strip_formatting = FormatStripper()

class GraphicsStripper(Transformer):
	def do_HR(self, node):		return [] # Remove Horizontal Rules
	def do_IMG(self, node):	return [] # Remove Images
	def do_MAP(self, node):	return [] # Remove Image Maps

	def do_BODY(self, node):
		node.removeAttribute("BACKGROUND")
		node.removeAttribute("BGCOLOR")
		return [node]

strip_graphics = GraphicsStripper()

....

doc = strip_formatting( strip_graphics( doc ) )

---------- cut here ----------

If acceptable, I'd like to see some form of these added to the
dom.utils module; they seem to fit in with the strip_whitespace
function.

-Jeff Rush