Regular expression help

Fredrik Lundh fredrik at
Thu Jul 17 08:44:50 CEST 2003

David Lees wrote:

> I forget how to find multiple instances of stuff between tags using
> regular expressions.  Specifically I want to find all the text between a
> series of begin/end pairs in a multiline file.
> I tried:
>  >>> p = 'begin(.*)end'
>  >>> m =,s,re.DOTALL)
> and got everything between the first begin and last end.  I guess
> because of a greedy match.  What I want to do is a list where each
> element is the text between another begin/end pair.

people will tell you to use non-greedy matches, but that's often a
bad idea in cases like this: the RE engine has to store lots of back-
tracking information, and your program will consume a lot more
memory than it has to (and may run out of stack and/or memory).

a better approach is to do two searches: first search for a "begin",
and once you've found that, look for an "end"

    import re

    pos = 0

    START = re.compile("begin")
    END = re.compile("end")

    while 1:
        m =, pos)
        if not m:
        start = m.end()
        m =, start)
        if not m:
        end = m.start()
        pos = m.end() # move forward

at this point, it's also obvious that you don't really have to use
regular expressions:

    pos = 0

    while 1:
        start = text.find("begin", pos)
        if start < 0:
        start += 5
        end = text.find("end", start)
        if end < 0:
        pos = end # move forward


<!-- (the eff-bot guide to) the python standard library (redux):

More information about the Python-list mailing list