brackets content regular expression

Paul McGuire ptmcg at austin.rr.com
Fri Oct 31 13:51:41 EDT 2008


On Oct 31, 12:25 pm, netimen <neti... at gmail.com> wrote:
> I have a text containing brackets (or what is the correct term for
> '>'?). I'd like to match text in the uppermost level of brackets.
>
> So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt  > ff > > 2 >
> bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> bbb < a <tt  > ff > > 2 )?
>
> P.S. sorry for my english.

To match opening and closing parens, delimiters, whatever (I refer to
these '<>' as "angle brackets" when talking about them in this
context, otherwise they are just "less than" and "greater than"), you
will need some kind of stack-based parser.  You can write your own
without much trouble - there are built-ins in pyparsing that do most
of the work.

Here is the nestedExpr method:
>>> from pyparsing import nestedExpr
>>> print nestedExpr('<','>').searchString('aaaa 123 < 1 aaa < t bbb < a <tt  > ff > > 2 > bbbbb')
[[['1', 'aaa', ['t', 'bbb', ['a', ['tt'], 'ff']], '2']]]

Note that the results show not the original nested text, but the
parsed words in a fully nested structure.

If all you want is the highest-level text, then you can wrap your
nestedExpr parser inside a call to originalTextFor:

>>> from pyparsing import originalTextFor
>>> print originalTextFor(nestedExpr('<','>')).searchString('aaaa 123 < 1 aaa < t bbb < a <tt  > ff > > 2 > bbbbb')
[['< 1 aaa < t bbb < a <tt  > ff > > 2 >']]

More on pyparsing at http://pyparsing.wikispaces.com.

-- Paul



More information about the Python-list mailing list