brackets content regular expression
ptmcg at austin.rr.com
Fri Oct 31 18:51:41 CET 2008
On Oct 31, 12:25 pm, netimen <neti... at gmail.com> wrote:
> I have a text containing brackets (or what is the correct term for
> '>'?). I'd like to match text in the uppermost level of brackets.
> So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 >
> bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> bbb < a <tt > ff > > 2 )?
> P.S. sorry for my english.
To match opening and closing parens, delimiters, whatever (I refer to
these '<>' as "angle brackets" when talking about them in this
context, otherwise they are just "less than" and "greater than"), you
will need some kind of stack-based parser. You can write your own
without much trouble - there are built-ins in pyparsing that do most
of the work.
Here is the nestedExpr method:
>>> from pyparsing import nestedExpr
>>> print nestedExpr('<','>').searchString('aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 > bbbbb')
[[['1', 'aaa', ['t', 'bbb', ['a', ['tt'], 'ff']], '2']]]
Note that the results show not the original nested text, but the
parsed words in a fully nested structure.
If all you want is the highest-level text, then you can wrap your
nestedExpr parser inside a call to originalTextFor:
>>> from pyparsing import originalTextFor
>>> print originalTextFor(nestedExpr('<','>')).searchString('aaaa 123 < 1 aaa < t bbb < a <tt > ff > > 2 > bbbbb')
[['< 1 aaa < t bbb < a <tt > ff > > 2 >']]
More on pyparsing at http://pyparsing.wikispaces.com.
More information about the Python-list