Parsing nested constructs

Paul McGuire ptmcg at austin.rr.com
Sat Sep 8 20:10:38 EDT 2007


On Sep 8, 3:42 pm, tool69 <k... at free.fr> wrote:
> Hi,
>
> I need to parse some source with nested parenthesis, like this :
>
>  >cut-------------
>
> {
>      {item1}
>      {
>       {item2}
>       {item3}
>      }
>
> }
>
>  >cut-------------
>
> In fact I'd like to get all start indexes of items and their end (or
> lenght).
>
> I know regexps are rather limited for this type of problems.
> I don't need an external module.
>
> What would you suggest me ?
>
> Thanks.

Well, it is an external module, but pyparsing makes this pretty
straightforward:

from pyparsing import *

data = """
{
     {item1}
     {
      {item2}
      {item3}
     }

}
"""

# define delimiters, but suppress them from the output
LBRACE,RBRACE = map(Suppress,"{}")

# forward define recursive items list
items = Forward()

# items is zero or more words of alphas and numbers, or an embedded
# group enclosed in braces
items << ZeroOrMore( Word(alphanums) | Group( LBRACE + items +
RBRACE ) )

# parse the input string, and print out the results
print items.parseString(data)

"""
prints:
[[['item1'], [['item2'], ['item3']]]]

or:
[
    [
        ['item1'],
        [
            ['item2'],
            ['item3']
        ]
    ]
]
"""

-- Paul




More information about the Python-list mailing list