parenthesis
Joshua Marshall
jmarshal at mathworks.com
Mon Nov 4 15:49:43 EST 2002
Regular expressions are not powerful enough to be used to match
strings when you need to be intelligent about nesting. There are
probably parser generators available--links anyone?
For your particular application, also take a look at the "parser"
Python module. It's a little ugly, since it gives you complete
(rather than abstract) syntax trees, but it may help you.
Michele Simionato <mis6 at pitt.edu> wrote:
> Suppose I want to parse the following expression:
>>>> exp='(a*(b+c*(2-x))+d)+f(s1)'
> I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.
> Now if I use a greedy regular expression
>>>> import re; greedy=re.compile('\(.*\)')
> I obtain to much, the full expression:
>>>> match=greedy.search(exp); match.group()
> '(a*(b+c*(2-x))+d)+f(s1)'
> On the other hand, if I use a nongreedy regular expression
>>>> nongreedy=re.compile('\(.*?\)')
> I obtain too little:
>>>> match=nongreedy.search(exp); match.group()
> '(a*(b+c*(2-x)'
> Is there a way to specify a clever regular expression able to match
> the first parenthesized group ? What I did, was to write a routine
> to extract the first parenthesized group:
> def parenthesized_group(exp):
> nesting_level,out=0,[]
> for c in exp:
> out.append(c)
> if c=='(': nesting_level+=1
> elif c==')': nesting_level-=1
> if nesting_level==0: break
> return ''.join(out)
>>>> print parenthesized_group(exp)
> (a*(b+c*(2-x))+d)
> Still, this seems to me not the best way to go and I would like to know
> if this can be done with a regular expression. Notice that I don't need
> to control all the nesting levels of the parenthesis, for me it is enough
> to recognize the end of the first parenthesized group.
> Obiously, I would like a general recipe valid for more complicate
> expressions: in particular I cannot assume that the first group ends
> right before a mathematical operator (like '+' in this case) since
> these expressions are not necessarely mathematical expressions (as the
> example could wrongly suggest). In general I have expressions of the
> form
> ( ... contains nested expressions with parenthesis... )...other stuff
> where other stuff may contain nested parenthesis. I can assume that
> there are no errors, i.e. that all the internal open parenthesis are
> matched by closing parenthesis.
> Is this a problem which can be tackled with regular expressions ?
> TIA,
> --
> Michele Simionato - Dept. of Physics and Astronomy
> 210 Allen Hall Pittsburgh PA 15260 U.S.A.
> Phone: 001-412-624-9041 Fax: 001-412-624-9163
> Home-page: http://www.phyast.pitt.edu/~micheles/
More information about the Python-list
mailing list