regex for balanced parentheses?

Paul McGuire ptmcg at austin.rr.com
Thu Jun 12 15:38:16 CEST 2008


On Jun 12, 6:06 am, David C. Ullrich <dullr... at sprynet.com> wrote:
> There's no regex that detects balanced parentheses,
> or is there?
>
> That is, search('and so ((x+y)+z) = (x+(y+z))')
> should return '((x+y)+z)'.
>
> Not just a theoretical question, I'm cleaning up
> a large body of TeX code and a regex that did that
> would be very convenient. (Yes, I know it's not
> hard to detect balanced parentheses by hand...)
>
> I don't know that stuff, but I seen to recall reading
> that there's a theoretical notion of "regular expression"
> in CS, and a regular expression in that sense cannot
> do this. But in the same place I read that the actual
> regexes in programming languages include things which
> are not regular expressions in that theoretical sense.
>
> David C. Ullrich

Pyparsing includes several helper methods for building common
expression patterns, such as delimitedList, oneOf, operatorPrecedence,
countedArray - and a fairly recent addition, nestedExpr.  nestedExpr
creates an expression for matching nested text within opening and
closing delimiters, such as ()'s, []'s, {}'s, etc.  The default
delimiters are ()'s.  You can also specify a content expression, so
that pyparsing will look for and construct meaningful results.  The
default is to return any text nested within the delimiters, broken up
by whitespace.

Here is your sample string parsed using the default nestedExpr:
>>> from pyparsing import nestedExpr
>>> for e in nestedExpr().searchString('and so ((x+y)+z) = (x+(y+z))'):
...     print e[0]
...
[['x+y'], '+z']
['x+', ['y+z']]

Pyparsing found 2 matches in your input string.  Note that the parens
are gone from the results - nestedExpr returns a nested list
structure, with nesting corresponding to the ()'s found in the
original string.

Pyparsing supports parse-time callbacks, called 'parse actions', and
it comes with several commonly used methods, such as removeQuotes,
upcaseTokens, and keepOriginalText.  The purpose of keepOriginalText
is to revert any structuring or parsing an expression or other parse
actions might do, and just return the originally matched text.

Here is how keepOriginalText gives you back just the nested
parenthetical expressions, without any additional processing or
grouping:
>>> from pyparsing import keepOriginalText
>>> matchedParens = nestedExpr().setParseAction(keepOriginalText)
>>> for e in matchedParens.searchString('and so ((x+y)+z) = (x+(y+z))'):
...     print e[0]
...
((x+y)+z)
(x+(y+z))

-- Paul



More information about the Python-list mailing list