parenthesis

Joshua Marshall jmarshal at mathworks.com
Mon Nov 4 15:49:43 EST 2002


Regular expressions are not powerful enough to be used to match
strings when you need to be intelligent about nesting.  There are
probably parser generators available--links anyone?

For your particular application, also take a look at the "parser"
Python module.  It's a little ugly, since it gives you complete
(rather than abstract) syntax trees, but it may help you.


Michele Simionato <mis6 at pitt.edu> wrote:
> Suppose I want to parse the following expression:

>>>> exp='(a*(b+c*(2-x))+d)+f(s1)'

> I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.

> Now if I use a greedy regular expression

>>>> import re; greedy=re.compile('\(.*\)')

> I obtain to much, the full expression:

>>>> match=greedy.search(exp); match.group()

> '(a*(b+c*(2-x))+d)+f(s1)'

> On the other hand, if I use a nongreedy regular expression

>>>> nongreedy=re.compile('\(.*?\)')

> I obtain too little:

>>>> match=nongreedy.search(exp); match.group()

> '(a*(b+c*(2-x)'

> Is there a way to specify a clever regular expression able to match
> the first parenthesized group  ? What I did, was to write a routine
> to extract the first parenthesized group:

> def parenthesized_group(exp):
>     nesting_level,out=0,[]
>     for c in exp:
> 	out.append(c)
>         if c=='(': nesting_level+=1
> 	elif c==')': nesting_level-=1
> 	if nesting_level==0: break
>     return ''.join(out)

>>>> print parenthesized_group(exp)

> (a*(b+c*(2-x))+d)

> Still, this seems to me not the best way to go and I would like to know
> if this can be done with a regular expression. Notice that I don't need
> to control all the nesting levels of the parenthesis, for me it is enough
> to recognize the end of the first parenthesized group.

> Obiously, I would like a general recipe valid for more complicate
> expressions: in particular I cannot assume that the first group ends 
> right before a mathematical operator (like '+' in this case) since
> these expressions are not necessarely mathematical expressions (as the
> example could wrongly suggest). In general I have expressions of the
> form

> ( ... contains nested expressions with parenthesis... )...other stuff

> where other stuff may contain nested parenthesis. I can assume that 
> there are no errors, i.e. that all the internal open parenthesis are
> matched by closing parenthesis.

> Is this a problem which can be tackled with regular expressions ?

> TIA,

> --
> Michele Simionato - Dept. of Physics and Astronomy
> 210 Allen Hall Pittsburgh PA 15260 U.S.A.
> Phone: 001-412-624-9041 Fax: 001-412-624-9163
> Home-page: http://www.phyast.pitt.edu/~micheles/



More information about the Python-list mailing list