brackets content regular expression

Fri Oct 31 17:19:31 EDT 2008

On 31 oct, 20:38, netimen <neti... at gmail.com> wrote:
> there may be different levels of nesting:
>
> "a < b < Ó > d > here starts a new group: < 1 < e < f  > g > 2 >
> another group: < 3 >"
>
> On 31 окт, 21:57, netimen <neti... at gmail.com> wrote:
>
> > Thank's but if i have several top-level groups and want them match one
> > by one:
>
> > text = "a < b < Ó > d > here starts a new group:  < e < f  > g >"
>
> > I want to match first " b < Ó > d " and then " e < f  > g " but not "
> > b < Ó > d > here starts a new group:  < e < f  > g "
> > On 31 ÏËÔ, 20:53, Matimus <mccre... at gmail.com> wrote:
>
> > > On Oct 31, 10:25šam, netimen <neti... at gmail.com> wrote:
>
> > > > I have a text containing brackets (or what is the correct term for
> > > > '>'?). I'd like to match text in the uppermost level of brackets.
>
> > > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > > bbb < a <tt š> ff > > 2 )?
>
> > > > P.S. sorry for my english.
>
> > > I think most people call them "angle brackets". Anyway it should be
> > > easy to just match the outer most brackets:
>
> > > >>> import re
> > > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > > >>> r = re.compile("<(.+)>")
> > > >>> m = r.search(text)
> > > >>> m.group(1)
>
> > > ' 1 aaa < t bbb < a <tt š> ff > > 2 '
>
> > > In this case the regular expression is automatically greedy, matching
> > > the largest area possible. Note however that it won't work if you have
> > > something like this: "<first> <second>".
>
> > > Matt
>
>

Hi,

Regular expressions or pyparsing might be overkill for this problem ;
you can use a simple algorithm to read each character, increment a
counter when you find a < and decrement when you find a > ; when the
counter goes back to its initial value you have the end of a top level
group

Something like :

def top_level(txt):
    level = 0
    start = None
    groups = []
    for i,car in enumerate(txt):
        if car == "<":
            level += 1
            if not start:
                start = i
        elif car == ">":
            level -= 1
            if start and level == 0:
                groups.append(txt[start+1:i])
                start = None
    return groups

print top_level("a < b < 0 > d > < 1 < e < f  > g > 2 > < 3 >")

>> [' b < 0 > d ', ' 1 < e < f  > g > 2 ', ' 3 ']

Best,
Pierre