[Tutor] Python re without string consumption

Danny Yoo dyoo at hkn.eecs.berkeley.edu
Thu Jan 25 07:44:31 CET 2007



On Wed, 24 Jan 2007, Jacob Abraham wrote:

>>>> import re
>>>> re.findall("abca", "abcabcabca")
> ["abca", "abca"]
>
> While I am expecting.
>
> ["abca", "abca", "abca"]


Hi Jacob,

Just to make sure: do you understand, though, why findall() won't give you 
the results you want?  The documentation on findall() says:

""" Return a list of all non-overlapping matches of pattern in string. If 
one or more groups are present in the pattern, return a list of groups; 
this will be a list of tuples if the pattern has more than one group. 
Empty matches are included in the result unless they touch the beginning 
of another match. New in version 1.5.2. Changed in version 2.4: Added the 
optional flags argument. """

It's designed not to return overlapping items.



> How do I modify my regular expression to do the same.

We can just write our own helper function to restart the match.  That is, 
we can expliciltely call search() ourselves, and pass in a new string 
that's a fragment of the old one.


Concretely,

#############
>>> import re
>>> text = "abcabcabca"
>>> re.search("abca", text)
<_sre.SRE_Match object at 0x50d40>
>>> re.search("abca", text).start()
0
#############

Ok, so we know the first match starts at 0.  So let's just restart the 
search, skipping that position.

##################################
>>> re.search("abca", text[1:])
<_sre.SRE_Match object at 0x785d0>
>>> re.search("abca", text[1:]).start()
2
##################################

There's our second match.  Let's continue.  We have to be careful, though, 
to make sure we're skipping the right number of characters:

#######################################
>>> re.search("abca", text[4:]).start()
2
>>> re.search("abca", text[7:])
>>> 
#######################################

And there are no matches after this point.


You can try writing this helper function yourself.  If you need help doing 
so, please feel free to ask the list for suggestions.


More information about the Tutor mailing list