parsing tree from excel sheet

alb al.basili at gmail.com
Sat Jan 31 23:45:15 CET 2015

```Hi Peter,

Peter Otten <__peter__ at web.de> wrote:
[]
>
>> Peter Otten <__peter__ at web.de> wrote:
>
>>>    def show2(self):
>>>        yield str(self)
>>>        for child in self.children:
>>>            yield from child.show2()
[]
>
> Given a tree
>
> A --> A1
>      A2 --> A21
>             A22
>      A3
>
> assume a slightly modified show2():
>
> def append_nodes(node, nodes):
>    nodes.append(node)
>    for child in node.children:
>        append_nodes(child, nodes)

I'm assuming you are referring to the method in the Node class.

>
> When you invoke this with the root node in the above sample tree and
> an empty list
>
> nodes = [] append_nodes(A, nodes)
>
> the first thing it will do is append the root node to the nodes list
>
> [A]
>
> Then it iterates over A's children:
>
> append_nodes(A1, nodes) will append A1 and return immediately because
> A1 itself has not children.
>
> [A, A1]
>
> append_nodes(A2, nodes) will append A2 and then iterate over A2's
> children. As A21 and A22 don't have any children append_nodes(A21,
> nodes) and append_nodes(A22, nodes) will just append the respective
> node with no further nested ("recursive") invocation, and thus the
> list is now
>
> [A, A1, A21, A22]
>
> Finally the append_nodes(A3, nodes) will append A3 and then return
> because it has no children, and we end up with
>
> nodes = [A, A1, A21, A22, A3]

So the recursive function will append children as long as there are any,
traversing the whole tree structure (yep, I saw the missing A2 in the
list as you mentioned already).

> Now why the generator? For such a small problem it doesn't matter, for
> large datasets it is convenient that you can process the first item
> immmediately, when the following ones may not yet be available.

I've read something about generators and they are a strong concept
(especially for a C-minded guy like me!).

[]
> Ok, how to get from the recursive list building to yielding nodes as
> they are encountered? The basic process is always the same:
>
> def f(items)
>   items.append(3)
>   items.append(6)
>   for i in range(10):
>       items.append(i)
>
> items = []
> f(items)
> for item in items:
>   print(item)
>
> becomes
>
> def g():
>    yield 3
>    yield 6
>    for i in range(10):
>        yield i
>
> for item in g():
>    print(items)
---------------^ should be item and not items.

>
> In Python 3.3 there was added some syntactic sugar so that you can
> write
>
> def g():
>    yield 3
>    yield 6
>    yield from range(10)
>
>
> Thus
>
> def append_nodes(node, nodes):
>    nodes.append(node)
>    for child in node.children:
>        append_nodes(child, nodes)
>
>
> becomes
>
> def generate_nodes(node):
>    yield node
>    for child in node.children:
>        yield from generate_nodes(child)

I'm with you now! I guess it would have been nearly impossible to see
the real picture behind.

> This looks a lot like show2() except that it's not a method and thus
> the node not called self and that the node itself is yielded rather
> than str(node). The latter makes the function a bit more flexible and
> is what I should have done in the first place.

Indeed returning the node might be more useful than just yielding its
string.

>
> The show() method is basically the the same, but there are varying
> prefixes before the node name. Here's a simpler variant that just adds
> some indentation. We start with generate_nodes() without the syntactic
> sugar. This is because we need a name for the nodes yielded from the
> nested generator call so that we can modify them:
>
> def indented_nodes(node):
>    yield node
>    for child in node.children:
>        for desc in from indented_nodes(child):
>            yield desc
>
> Now let's modify the yielded nodes:
>
> def indented_nodes(node):
>    yield [node]

why this line has changed from 'yield node'?

>    for child in node.children:
>        for desc in indented_nodes(child):
>            yield ["***"] + desc

Ok, the need for manipulation does not allow to use the syntax sugar of
above.

>
> How does it fare on the example tree?
>
> A --> A1
>      A2 --> A21
>             A22
>      A3
>
> The lists will have an "***" entry for every nesting level, so we get
>
> [A]
> ["***", A1]
> ["***", A2]
> ["***", "***", A21]
> ["***", "***", A22]
> ["***", A3]
>
> With "".join() we can print it nicely:
>
> for item in indented_nodes(tree):
>    print("".join(item))
>
> But wait, "".join() only accepts strings so let's change
>
>    yield [node]
>
> to
>    yield [node.name] # str(node) would also work

Again my question, why not simply yield node.name?

> A
> ***A1
> ***A2
> ******A21
> ******A22
> ***A3
>
>>> def show2(root):
>>>    for line in root.show2():
>>>        print(line)
>
>> Here we implement the functions to print a node, but I'm not sure I
>> understand why do I have to iterate if the main() iterates again over the
>> nodes.
>
>
> A
> A1
>   A11
>   A12
> A2
>
> and I was unsure if there could be data files that have multiple root
> nodes, e. g.
>
> A
> A1
>   A11
>   A12
> A2
> B
> B1
> B2
>
> To simplify the handling of these I introduced an artificial root R
>
> R
> A
>  A1
>    A11
>    A12
>  A2
> B
>  B1
>  B2
>
> which makes all toplevel nodes in the data file children of R. In the
> main() function I iterate over R's children to hide R from the user.
>
> You can replace
>
>        for node in tree.children:
>            show_tree(node)
>            print("")
>
> in my original code with
>
>        show_tree(tree)
>
> to see the hidden node.

That makes it more clear indeed. Indeed this is what it going to happen
in reality since I'll have several subsystems at the very same level,
therefore all children of a #ROOT node.

>
> I may address the rest of your post later unless someone else does. In
> the mean time, can you please provide the data file that triggers the
> IndexError to help me with the debugging?

There was a mistake in my file indeed, now that I fixed it everything
works!

Al

```