Must be a bug in the re module [was: Why this result with the re module]

John Bond lists at asd-group.com
Wed Nov 3 00:09:17 EDT 2010


On 3/11/2010 3:55 AM, John Bond wrote:
>
>> Could you please reconsider how would you
>> work with this new one and see if my steps
>> are correct? If you agree with my 7-step
>> execution for the new regex, then:
>>
>> We finally found a real bug for re.findall:
>>
>>>>> re.findall('((.a.)*)*', 'Mary has a lamb')
>> [('', 'Mar'), ('', ''), ('', ''), ('', 'lam'), ('', ''), ('', '')]
>>
>>
>> Cheers,
>>
>> Yingjie
>>
>>
>>
>
> Nope, I'm afraid it is lack of understanding again.
>
> The outer capturing group that you've added is matching the entirety 
> of what's matched by the inner one (which is six matches, that you now 
> accept).  Because it only returns the last of them, it returns one 
> thing - an empty string (that being the last thing that the inner 
> group matched).  Findall is simply returning that in each of the six 
> return values it needs to return because of the inner one.
>
> You just need to accept that findall (like all of re) works fine, and 
> if it doesn't seem to do what you expect, it's because the expectation 
> is wrong.
>
> Cheers, JB

Just to clarify - findall is returning:

[ (only match in outer group, 1st match in inner group)
, (only match in outer group, 2nd match in inner group)
, (only match in outer group, 3rd match in inner group)
, (only match in outer group, 4th match in inner group)
, (only match in outer group, 5th match in inner group)
, (only match in outer group, 6th match in inner group)
]

Where "only match in outer group" = "6th match in inner group" owing to 
the way that capturing groups with repetition only return the last thing 
they matched.

Cheers, JB





More information about the Python-list mailing list