Walking deeply nested lists
Arnaud Delobelle
arnodel at googlemail.com
Sat Aug 28 03:21:31 EDT 2010
donn <donn.ingle at gmail.com> writes:
> This is all about walking trees, recursion and generators. None of
> which fit my brain at all!
>
> From an XML tree (an SVG file) I build a bunch of Tag objects.
>
> [I use lxml, but I am combining multiple svg files into a 'forest' of
> trees, so I can't use the lxml walking methods because they all stop
> at the 'end' of a branch where there is actually a 'link' over to
> another tree.]
>
> Each Tag has a flatwalk() method. The return from that is a list which
> can be deeply nested. As an example, something like this:
> L=[1, [2, 3, [4, [5, 6], 7], 8], [9, 10] ]
>
> (The numbers in the example would actually be Tag Instances.)
>
> Aim 1
> ---
> I'm trying to write a generator around such lists so that I can 'walk' them:
>
> for parent, child in mystery_generator(L):
> print level, item
>
> Right now, I am running a 'flatten' function on the entire list (which
> I don't savvy, I found it online) and then I iterate over that
> flattened list. This doesn't feel right to me. (Code at end.)
>
> Aim 2
> ---
> The Objects that represent the svg tags use that flatwalk() method to
> build the nested list, I'd far prefer flatwalk() to be directly
> useable in something like this way:
>
> for container, child in some_tag.flatwalk():
> print container, child
>
> The flatwalk() function is (and this is another puzzle see *) kind of
> recursive. "For each of my children, tell that child to go
> flatwalk()".
> (Code at end.)
> I am not sure how to turn such a thing into a generator.
>
> I keep wondering how os.walk() does its thing. My hierarchy of Tag
> objects can be thought of as directories (tags:g, symbol, svg), files
> (path, circle, etc.) and soft-links (use tags).
>
> * If an Instance calls a method on *another* Instance of the *same*
> class, is this still recursion? And how does this 'stack up'? I mean,
> literally, on the stack. Does each instance get its own stack, or does
> all the push, call, pop stuff happen in one main stack?
> (I worry about recursion depth limits because svg trees can get quite deep.)
>
>
> The walking code so far:
>
> ## Found code.
> def flatten(input):
> output = []
> stack = []
> stack.extend(reversed(input))
> while stack:
> top = stack.pop()
> if isinstance(top, list):
> stack.extend(reversed(top))
> else:
> output.append(top)
> return output
not a bad idea. I would rather write it as:
def flatten(input):
output = []
stack = list(input)
while stack:
top = stack.pop()
if isinstance(top, list):
stack.extend(top)
else:
output.append(top)
output.reverse()
return output
If you want to make it a generator function though, the initial version
is better. All you need to do is:
* change the line "output.append(top)" to "yield top"
* delete the line "return output"
Or you can go for the simple recursive approach:
def flatten(lst):
for el in lst:
if isinstance(el, list):
for x in flatten(el):
yield x
else:
yield el
> ## My walker
> def test_walk(e): #lxml Element comes in
> obj = ebag_get(e)['obj'] #get a tag object
> l=obj.flatwalk()
> ll= flatten(l)
> #No current solution to get 'parent' in this iterator
> #ie. for parent, child in ...
> for tag in ll:
> print tag
>
> Here's one of my Tag objects:
>
> class Brancher(object):
> def __init__(self, elem):
> self.elem = elem
> self.children = []
>
> def flatwalk(self):
> l=[self]
> for child in self.children:
> l.append( child.flatwalk() ) #recur(ish)ion here
> return l
This flattens the list in the flatwalk method (which IMHO it should do
given its name!):
def flatwalk(self):
flatlist = [self]
for child in self.children:
for el is child.flatwalk():
flatlist.append(el)
return flatlist
This turns it into a generator method:
def flatwalk(self):
yield self
for child in self.children:
for el is child.flatwalk():
yield el
This turns it into a generator method which yields parents as well:
def flatwalk(self, parent=None):
yield self, parent
for child in self.children:
for el is child.flatwalk(self):
yield el
Then you can write:
for tag, parent in mytag.flatwalk():
...
Of course, for the above to work, "leaf" objects need a modified
flatwalk method, e.g.:
Class LeafTag:
def flatwalk(self, parent=None):
yield self, parent
HTH (warning: all code untested).
--
Arnaud
More information about the Python-list
mailing list