[Tutor] mxTextTools help: find the deepest in nest lists

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Wed Dec 10 14:28:55 EST 2003



On Tue, 9 Dec 2003 pan at uchicago.edu wrote:

> I am learning how to use mxTextTools. I need help with
> the following ...
>
> I wanna match the deepest lists in a nested list tree.
> For example:
>
> The [2, 2a] and [3, 3b] in :
>
>    [[[1,[2, 2a]],[[3,3b],4]],6]
>
> The [a.1,b_2] and [d.3,e] in :
>
>    [[a.1,b_2],[[c,[d.3,e]],f]]
>
> The [a1,a2], [d.1, d.2] and [o3,o4] in :
>
>    [[[[[a1,a2],out],[d.1, d.2]],o2],[[o3,o4],o5]]
>
> How can I setup a tagtable for matching them ??


Hi Pan,

I'm not familiar enough with mxTextTools's tag table stuff.  Just to make
sure though: are you sure that a text-handling approach is best for this
problem?  Is your input a originally a data structure, or a string?


One way to approach this with regular expressions might be something like
this:

###
>>> import re
>>> def find_innermost_nodes(tree_string):
...     regex = re.compile(r"""\[        ## literal open brace
...                            [^\[]+    ## one or more non-brace
...                                      ## characters
...                            \]        ## literal close brace
...                       """, re.VERBOSE)
...     return regex.findall(tree_string)
...
>>> find_innermost_nodes("[[[1,[2, 2a]],[[3,3b],4]],6]")
['[2, 2a]]', '[3,3b],4]],6]']
###



Ooops!  Forgot to make the search nongreedy:

###
>>> def find_innermost_nodes(tree_string):
...     regex = re.compile(r"""\[        ## literal open brace
...                            [^\[]+?   ## one or more non-brace
...                                      ## characters, nongreedily
...                            \]        ## literal close brace
...                       """, re.VERBOSE)
...     return regex.findall(tree_string)
...
>>> find_innermost_nodes("[[[1,[2, 2a]],[[3,3b],4]],6]")
['[2, 2a]', '[3,3b]']
###


That's better.  *grin* Aren't the tag tables similar to regular
expressions?




That being said, I think this approach is approaching the problem in a way
that doesn't generalize well.  If we had some tree structure, I'd approach
this with something like:

###
def is_innermost_node(node):
    if is_leaf(node): return false
    return is_leaf(left(node)) and is_leaf(right(node))

def find_innermost_nodes(node):
    if is_leaf(node):
        return []
    if is_innermost_node(node):
        return [node]
    return (find_innermost_nodes(left(node)) +
            find_innermost_nodes(right(node)))
###

assuming that we have some functions for seeing if a node is a leaf
(is_leaf()), and well as ways of getting the left() and right() children
of a tree node.


Good luck to you!




More information about the Tutor mailing list