Regular expression help

yaipa h. yaipa at yahoo.com
Thu Jul 17 18:13:34 CEST 2003


Fredrik,

Not sure about the original poster, but I can use that. Thanks!

  --Alan

"Fredrik Lundh" <fredrik at pythonware.com> wrote in message news:<mailman.1058424506.12031.python-list at python.org>...
> David Lees wrote:
> 
> > I forget how to find multiple instances of stuff between tags using
> > regular expressions.  Specifically I want to find all the text between a
> > series of begin/end pairs in a multiline file.
> >
> > I tried:
> >  >>> p = 'begin(.*)end'
> >  >>> m = re.search(p,s,re.DOTALL)
> >
> > and got everything between the first begin and last end.  I guess
> > because of a greedy match.  What I want to do is a list where each
> > element is the text between another begin/end pair.
> 
> people will tell you to use non-greedy matches, but that's often a
> bad idea in cases like this: the RE engine has to store lots of back-
> tracking information, and your program will consume a lot more
> memory than it has to (and may run out of stack and/or memory).
> 
> a better approach is to do two searches: first search for a "begin",
> and once you've found that, look for an "end"
> 
>     import re
> 
>     pos = 0
> 
>     START = re.compile("begin")
>     END = re.compile("end")
> 
>     while 1:
>         m = START.search(text, pos)
>         if not m:
>             break
>         start = m.end()
>         m = END.search(text, start)
>         if not m:
>             break
>         end = m.start()
>         process(text[start:end])
>         pos = m.end() # move forward
> 
> at this point, it's also obvious that you don't really have to use
> regular expressions:
> 
>     pos = 0
> 
>     while 1:
>         start = text.find("begin", pos)
>         if start < 0:
>             break
>         start += 5
>         end = text.find("end", start)
>         if end < 0:
>             break
>         process(text[start:end])
>         pos = end # move forward
> 
> </F>
> 
> <!-- (the eff-bot guide to) the python standard library (redux):
> http://effbot.org/zone/librarybook-index.htm
> -->




More information about the Python-list mailing list