[Tutor] Python re without string consumption

Thu Jan 25 12:13:48 CET 2007

Jacob Abraham wrote:
> Hi Danny Yoo,
> 
>    I would like to thank you for the solution and
> the helper funtion that I have written is as follows. But I do hope
> that future versions of Python include a regular expression syntax to
> handle such cases simply because this method seems very process and
> memory intensive. I also notice that fall_back_len is a very crude
> solution.
> 
> def searchall(expr, text, fall_back_len=0):
>     while True:
>         match =  re.search(expr, text)
>         if not match:
>             break
>         yield match
>         end = match.end()
>         text = text[end-fall_back_len:]
> 
> for match in searchall("abca", "abcabcabca", 1):
>    print match.group()

The string slicing is not needed. The search() method for a compiled re 
has an optional pos parameter that tells where to start the search,. You 
can start the next search at the next position after the *start* of a 
successful search, so fall_back_len is not needed. How about this:

def searchall(expr, text):
   searchRe = re.compile(expr)
   match = searchRe.search(text)
   while match:
     yield match
     match = searchRe.search(text, match.start() + 1)

Also, if you are just finding plain text, you don't need to use regular 
expressions at all, you can use str.find():

def searchall(expr, text):
   pos = text.find(expr)
   while pos != -1:
     yield pos
     pos = text.find(expr, pos+1)

(inspired by this recipe: 
http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/499314)

Kent