Parsing nested constructs
Paul McGuire
ptmcg at austin.rr.com
Sat Sep 8 20:10:38 EDT 2007
On Sep 8, 3:42 pm, tool69 <k... at free.fr> wrote:
> Hi,
>
> I need to parse some source with nested parenthesis, like this :
>
> >cut-------------
>
> {
> {item1}
> {
> {item2}
> {item3}
> }
>
> }
>
> >cut-------------
>
> In fact I'd like to get all start indexes of items and their end (or
> lenght).
>
> I know regexps are rather limited for this type of problems.
> I don't need an external module.
>
> What would you suggest me ?
>
> Thanks.
Well, it is an external module, but pyparsing makes this pretty
straightforward:
from pyparsing import *
data = """
{
{item1}
{
{item2}
{item3}
}
}
"""
# define delimiters, but suppress them from the output
LBRACE,RBRACE = map(Suppress,"{}")
# forward define recursive items list
items = Forward()
# items is zero or more words of alphas and numbers, or an embedded
# group enclosed in braces
items << ZeroOrMore( Word(alphanums) | Group( LBRACE + items +
RBRACE ) )
# parse the input string, and print out the results
print items.parseString(data)
"""
prints:
[[['item1'], [['item2'], ['item3']]]]
or:
[
[
['item1'],
[
['item2'],
['item3']
]
]
]
"""
-- Paul
More information about the Python-list
mailing list