brackets content regular expression

Matimus mccredie at gmail.com
Fri Oct 31 17:36:57 EDT 2008


On Oct 31, 11:57 am, netimen <neti... at gmail.com> wrote:
> Thank's but if i have several top-level groups and want them match one
> by one:
>
> text = "a < b < Ó > d > here starts a new group:  < e < f  > g >"
>
> I want to match first " b < Ó > d " and then " e < f  > g " but not "
> b < Ó > d > here starts a new group:  < e < f  > g "
> On 31 ÏËÔ, 20:53, Matimus <mccre... at gmail.com> wrote:
>
> > On Oct 31, 10:25šam, netimen <neti... at gmail.com> wrote:
>
> > > I have a text containing brackets (or what is the correct term for
> > > '>'?). I'd like to match text in the uppermost level of brackets.
>
> > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > bbb < a <tt š> ff > > 2 )?
>
> > > P.S. sorry for my english.
>
> > I think most people call them "angle brackets". Anyway it should be
> > easy to just match the outer most brackets:
>
> > >>> import re
> > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > >>> r = re.compile("<(.+)>")
> > >>> m = r.search(text)
> > >>> m.group(1)
>
> > ' 1 aaa < t bbb < a <tt š> ff > > 2 '
>
> > In this case the regular expression is automatically greedy, matching
> > the largest area possible. Note however that it won't work if you have
> > something like this: "<first> <second>".
>
> > Matt
>
>

As far as I know, you can't do that with a regular expressions (by
definition regular expressions aren't recursive). You can use a
regular expression to aid you, but there is no magic expression that
will give it to you for free.

In this case it is actually pretty easy to do it without regular
expressions at all:

>>> text = "a < b < O > d > here starts a new group:  < e < f  > g >"
>>> def get_nested_strings(text, depth=0):
...     stack = []
...     for i, c in enumerate(text):
...         if c == '<':
...             stack.append(i)
...         elif c == '>':
...             start = stack.pop() + 1
...             if len(stack) == depth:
...                 yield text[start:i]
...
>>> for seg in get_nested_strings(text):
...  print seg
...
 b < O > d
 e < f  > g


Matt



More information about the Python-list mailing list