help with recursive whitespace filter in
MRAB
google at mrabarnett.plus.com
Sun May 10 13:35:59 EDT 2009
rustom wrote:
> On May 10, 9:49 pm, Steve Howell <showel... at yahoo.com> wrote:
>> On May 10, 9:10 am, Rustom Mody <rustompm... at gmail.com> wrote:
>>
>>
>>
>>> I am trying to write a recursive filter to remove whitespace-only
>>> nodes for minidom.
>>> The code is below.
>>> Strangely it deletes some whitespace nodes and leaves some.
>>> If I keep calling it -- like so: fws(fws(fws(doc))) then at some
>>> stage all the ws nodes disappear
>>> Does anybody have a clue?
>>> from xml.dom.minidom import parse
>>> #The input to fws is the output of parse("something.xml")
>>> def fws(ele):
>>> """ filter white space (recursive)"""
>>> for c in ele.childNodes:
>>> if isWsNode(c):
>>> ele.removeChild(c)
>>> #c.unlink() Makes no diff whether this is there or not
>>> elif c.nodeType == ele.ELEMENT_NODE:
>>> fws(c)
>>> def isWsNode(ele):
>>> return (ele.nodeType == ele.TEXT_NODE and not ele.data.strip())
>> I would avoid doing things like delete/remove in a loop. Instead
>> build a list of things to delete.
>
> Yeah I know. I would write the whole damn thing functionally if I knew
> how. But cant figure out the API.
> I actually started out to write a (haskell-style) copy out the whole
> tree minus the unwanted nodes but could not figure it out
>
def fws(ele):
""" filter white space (recursive)"""
empty_nodes = []
for c in ele.childNodes:
if isWsNode(c):
empty_nodes.append(c)
elif c.nodeType == ele.ELEMENT_NODE:
fws(c)
for c in empty_nodes:
ele.removeChild(c)
More information about the Python-list
mailing list