Must be a bug in the re module [was: Why this result with the re module]

Yingjie Lan lanyjie at yahoo.com
Wed Nov 3 07:16:29 CET 2010


--- On Wed, 11/3/10, John Bond <lists at asd-group.com> wrote:

>    3) then said there must be >=0 occurrences of what's inside it,
>    which of course there is, so that has no effect.    
>
>    ((.a.)*)*

Hi, 

I think there should be a difference: unlike before,
now what's inside the outer group can match an empty 
string. And so by reason of the greediness of the 
quantifier * of the outer group (that is, the last *), 
it should take up the empty string after each 
non-empty match.

So, the first match in 'Mary has a lamb' must be:

'' + 'Mar' + '' (the empty string before the 'y')

(note the first '' is before the 'M')
Then, after skipping the 'y' (remember, the empty 
string before 'y' is already taken), comes a second:

'' (the one between 'y' and ' ')

Then after skipping the space ' ', comes a third:

'has' + ' a ' + 'lam' + '' (the empty string before the 'b')

And finally, it matches the empty string after 'b':

''

So there should be total of four matches -- isn't it?

Yingjie


      



More information about the Python-list mailing list