brackets content regular expression
Pierre Quentel
quentel.pierre at wanadoo.fr
Fri Oct 31 17:19:31 EDT 2008
On 31 oct, 20:38, netimen <neti... at gmail.com> wrote:
> there may be different levels of nesting:
>
> "a < b < Ó > d > here starts a new group: < 1 < e < f > g > 2 >
> another group: < 3 >"
>
> On 31 окт, 21:57, netimen <neti... at gmail.com> wrote:
>
> > Thank's but if i have several top-level groups and want them match one
> > by one:
>
> > text = "a < b < Ó > d > here starts a new group: < e < f > g >"
>
> > I want to match first " b < Ó > d " and then " e < f > g " but not "
> > b < Ó > d > here starts a new group: < e < f > g "
> > On 31 ÏËÔ, 20:53, Matimus <mccre... at gmail.com> wrote:
>
> > > On Oct 31, 10:25šam, netimen <neti... at gmail.com> wrote:
>
> > > > I have a text containing brackets (or what is the correct term for
> > > > '>'?). I'd like to match text in the uppermost level of brackets.
>
> > > > So, I have sth like: 'aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >
> > > > bbbbb'. How to match text between the uppermost brackets ( 1 aaa < t
> > > > bbb < a <tt š> ff > > 2 )?
>
> > > > P.S. sorry for my english.
>
> > > I think most people call them "angle brackets". Anyway it should be
> > > easy to just match the outer most brackets:
>
> > > >>> import re
> > > >>> text = "aaaa 123 < 1 aaa < t bbb < a <tt š> ff > > 2 >"
> > > >>> r = re.compile("<(.+)>")
> > > >>> m = r.search(text)
> > > >>> m.group(1)
>
> > > ' 1 aaa < t bbb < a <tt š> ff > > 2 '
>
> > > In this case the regular expression is automatically greedy, matching
> > > the largest area possible. Note however that it won't work if you have
> > > something like this: "<first> <second>".
>
> > > Matt
>
>
Hi,
Regular expressions or pyparsing might be overkill for this problem ;
you can use a simple algorithm to read each character, increment a
counter when you find a < and decrement when you find a > ; when the
counter goes back to its initial value you have the end of a top level
group
Something like :
def top_level(txt):
level = 0
start = None
groups = []
for i,car in enumerate(txt):
if car == "<":
level += 1
if not start:
start = i
elif car == ">":
level -= 1
if start and level == 0:
groups.append(txt[start+1:i])
start = None
return groups
print top_level("a < b < 0 > d > < 1 < e < f > g > 2 > < 3 >")
>> [' b < 0 > d ', ' 1 < e < f > g > 2 ', ' 3 ']
Best,
Pierre
More information about the Python-list
mailing list