regex for balanced parentheses?

David C. Ullrich dullrich at sprynet.com
Thu Jun 12 10:30:58 EDT 2008


On Thu, 12 Jun 2008 06:38:16 -0700 (PDT), Paul McGuire
<ptmcg at austin.rr.com> wrote:

>On Jun 12, 6:06 am, David C. Ullrich <dullr... at sprynet.com> wrote:
>> There's no regex that detects balanced parentheses,
>> or is there?
>>
>> [...]
>
>Pyparsing includes several helper methods for building common
>expression patterns, such as delimitedList, oneOf, operatorPrecedence,
>countedArray - and a fairly recent addition, nestedExpr.  nestedExpr
>creates an expression for matching nested text within opening and
>closing delimiters, such as ()'s, []'s, {}'s, etc. 

Keen. Howdya know I wanted that? Thanks.

TeX is one of the amazing things about free software. Knuth
is great in many ways. He totally blew it in one detail,
unfortunately one that comes up a lot: '$' is an opening
delimiter, for which the corresponding closing delimiter
is also '$'. Then better yet, '$$' is another double-duty
delimiter... what I've done with that is first split
on '$$', taking the odd-numbered bits to be the parts
enclosed in $$..$$, and then taking the remining
parts and splitting on $. Not hard but it gives me a
creepy feeling.

Hence the question: Can pyparsing tell the difference
between '$' and '$'? (heh-heh).

> The default
>delimiters are ()'s.  You can also specify a content expression, so
>that pyparsing will look for and construct meaningful results.  The
>default is to return any text nested within the delimiters, broken up
>by whitespace.
>
>Here is your sample string parsed using the default nestedExpr:
>>>> from pyparsing import nestedExpr
>>>> for e in nestedExpr().searchString('and so ((x+y)+z) = (x+(y+z))'):
>...     print e[0]
>...
>[['x+y'], '+z']
>['x+', ['y+z']]
>
>Pyparsing found 2 matches in your input string.  Note that the parens
>are gone from the results - nestedExpr returns a nested list
>structure, with nesting corresponding to the ()'s found in the
>original string.
>
>Pyparsing supports parse-time callbacks, called 'parse actions', and
>it comes with several commonly used methods, such as removeQuotes,
>upcaseTokens, and keepOriginalText.  The purpose of keepOriginalText
>is to revert any structuring or parsing an expression or other parse
>actions might do, and just return the originally matched text.
>
>Here is how keepOriginalText gives you back just the nested
>parenthetical expressions, without any additional processing or
>grouping:
>>>> from pyparsing import keepOriginalText
>>>> matchedParens = nestedExpr().setParseAction(keepOriginalText)
>>>> for e in matchedParens.searchString('and so ((x+y)+z) = (x+(y+z))'):
>...     print e[0]
>...
>((x+y)+z)
>(x+(y+z))
>
>-- Paul

David C. Ullrich



More information about the Python-list mailing list