Bengt Richter
bokr at
Mon Nov 4 21:25:57 EST 2002
On 4 Nov 2002 22:05:11 GMT, bokr at (Bengt Richter) wrote:
>On 4 Nov 2002 12:24:31 -0800, mis6 at (Michele Simionato) wrote:
>>Suppose I want to parse the following expression:
>>>>> exp='(a*(b+c*(2-x))+d)+f(s1)'
>>I want to extract the first part, i.e. '(a*(b+c*(2-x))+d)'.
[... previous version ...]
Wondering why I didn't just write:
>>> import re
>>> rx = re.compile(r'([()]|[^()]+)')
>>> class Addelim:
... def __init__(self, delim):
... self.parens=0; self.delim=delim
... def __call__(self, m):
... s =
... if s=='(': self.parens+=1
... if self.parens==1 and s==')':
... self.parens=0
... return s+self.delim
... if s==')': self.parens -=1
... return s
>>> exp = '(a*(b+c*(2-x))+d)+f(s1)'
It was natural to be able to specify the delimiter. And the + is probably
better than the * on the non-paren "[^()]+" part of the pattern.
Then using \n as delimiter to break into lines one can just print it.
>>> print rx.sub(Addelim('\n'),exp)
Which you could also use like:
>>> print rx.sub(Addelim('\n'),exp).splitlines()
['(a*(b+c*(2-x))+d)', '+f(s1)']
Or to get back to your original requirement,
>>> print rx.sub(Addelim('\n'),exp).splitlines()[0]
But I suspect it would run faster to let a regex split the string and then use
a loop like yours on the pieces, which would be '(' or ')' or some other string
that you don't need to look at character by character. E.g.,
>>> rx = re.compile(r'([()])')
>>> ss = rx.split(exp)
>>> ss
['', '(', 'a*', '(', 'b+c*', '(', '2-x', ')', '', ')', '+d', ')', '+f', '(', 's1', ')', '']
Notice that the splitter matches wind up at the odd indices. I think that's generally true
when you put parens around the splitting expression, to return the matches as part of the list,
but I'm not 100% certain. Anyway, you could make use of that, something like:
>>> parens = 0
>>> endix = []
>>> for i in range(1,len(ss),2):
... if parens==1 and ss[i]==')':
... parens=0; endix.append(i+1)
... elif ss[i]=='(': parens += 1
... else: parens -= 1
>>> endix
[12, 16]
You could break the loop like you did if you just want the first expression,
or you could grab it by
>>> print ''.join(ss[:endix[0]])
or list the bunch,
>>> lo=0
>>> for hi in endix:
... print ''.join(ss[lo:hi])
... lo = hi
or whatever. Which is not as slick, but probably faster if you had to do a bi-ig bunch of them.
I think when the fenceposts are simple, but you are mainly interested in the data between, splitting
on a fencepost regex and processing the resulting list can be simpler and faster than trying to
do it all with a complex regex.
Bengt Richter
More information about the Python-list
mailing list