[XML-SIG] A Few Bugs in dom/transformer.py
Jeff Rush
Jeff Rush" <jrush@summit-research.com
Sat, 19 Jun 99 04:45:03 -0500
I've checked the XML-SIG mailing list archives and the latest
CVS for updates to dom/transformer.py but didn't see any.
Hence...
Bug #1:
Throughout the dom/transformer.py, reference is made to
'NodeType' but the correct name is 'nodeType'.
Bug #2:
While trying to create a subclass of Transformer, in order to
strip out HTML formatting/graphics tags, I hit a problem where
v0.5.1 of Transformer won't modify the DOM tree it walks.
----- old code -----
new_children = []
for child in node.getChildren():
new_children = new_children + self._transform_node(child)
node._children = new_children
----- old code -----
Nodes don't have a '_children' attribute and besides, this doesn't
update the node's parentdict, hence any changes are not seen
by the higher DOM tree levels.
----- new code ------
new_children = []
for child in node.childNodes:
new_children = new_children + self._transform_node(child)
for child in node.childNodes[:] : # Remove Old Children
node.removeChild(child)
for child in new_children: # And Replace with (0 or more) New
node.appendChild(child)
----- new code -----
Suggestion #1:
Define a __call__ method in the Transformer class that
calls the existing transform method, so the following works:
class FormatStripper(Transformer):
....
strip_formatting = FormatStripper()
strip_formatting(doc)
I can now write my stripping transformers as:
---------- cut here ----------
class FormatStripper(xml.dom.transformer.Transformer):
def do_FONT(self, node): return node.childNodes
def do_B(self, node): return node.childNodes
def do_I(self, node): return node.childNodes
strip_formatting = FormatStripper()
class GraphicsStripper(Transformer):
def do_HR(self, node): return [] # Remove Horizontal Rules
def do_IMG(self, node): return [] # Remove Images
def do_MAP(self, node): return [] # Remove Image Maps
def do_BODY(self, node):
node.removeAttribute("BACKGROUND")
node.removeAttribute("BGCOLOR")
return [node]
strip_graphics = GraphicsStripper()
....
doc = strip_formatting( strip_graphics( doc ) )
---------- cut here ----------
If acceptable, I'd like to see some form of these added to the
dom.utils module; they seem to fit in with the strip_whitespace
function.
-Jeff Rush