[Tutor] Python re without string consumption

Thu Jan 25 08:22:18 CET 2007

Hi Danny Yoo,

   I would like to thank you for the solution and the helper funtion that I have written is as follows. But I do hope that future versions of Python include a regular expression syntax to handle such cases simply because this method seems very process and memory intensive. I also notice that fall_back_len is a very crude solution.

def searchall(expr, text, fall_back_len=0):

    while True:

        match =  re.search(expr, text)

        if not match:

            break

        yield match

        end = match.end()

        text = text[end-fallbacklen:]

for match in searchall("abca", "abcabcabca", 1):

   print match.group()

Thanks Again.

Jacob Abraham

----- Original Message ----
From: Danny Yoo <dyoo at hkn.eecs.berkeley.edu>
To: Jacob Abraham <jakieabraham at yahoo.com>
Cc: python <tutor at python.org>
Sent: Thursday, January 25, 2007 12:14:31 PM
Subject: Re: [Tutor] Python re without string consumption

On Wed, 24 Jan 2007, Jacob Abraham wrote:

>>>> import re
>>>> re.findall("abca", "abcabcabca")
> ["abca", "abca"]
>
> While I am expecting.
>
> ["abca", "abca", "abca"]

Hi Jacob,

Just to make sure: do you understand, though, why findall() won't give you 
the results you want?  The documentation on findall() says:

""" Return a list of all non-overlapping matches of pattern in string. If 
one or more groups are present in the pattern, return a list of groups; 
this will be a list of tuples if the pattern has more than one group. 
Empty matches are included in the result unless they touch the beginning 
of another match. New in version 1.5.2. Changed in version 2.4: Added the 
optional flags argument. """

It's designed not to return overlapping items.

> How do I modify my regular expression to do the same.

We can just write our own helper function to restart the match.  That is, 
we can expliciltely call search() ourselves, and pass in a new string 
that's a fragment of the old one.

Concretely,

#############
>>> import re
>>> text = "abcabcabca"
>>> re.search("abca", text)
<_sre.SRE_Match object at 0x50d40>
>>> re.search("abca", text).start()
0
#############

Ok, so we know the first match starts at 0.  So let's just restart the 
search, skipping that position.

##################################
>>> re.search("abca", text[1:])
<_sre.SRE_Match object at 0x785d0>
>>> re.search("abca", text[1:]).start()
2
##################################

There's our second match.  Let's continue.  We have to be careful, though, 
to make sure we're skipping the right number of characters:

#######################################
>>> re.search("abca", text[4:]).start()
2
>>> re.search("abca", text[7:])
>>> 
#######################################

And there are no matches after this point.

You can try writing this helper function yourself.  If you need help doing 
so, please feel free to ask the list for suggestions.

____________________________________________________________________________________
Do you Yahoo!?
Everyone is raving about the all-new Yahoo! Mail beta.
http://new.mail.yahoo.com