Must be a bug in the re module [was: Why this result with the re module]

MRAB python at mrabarnett.plus.com
Wed Nov 3 00:02:45 EDT 2010


On 03/11/2010 03:42, Yingjie Lan wrote:
>> Matches an empty string, returns ''
>>
>> The result is therefore ['Mar', '', '', 'lam', '', '']
>
> Thanks, now I see it through with clarity.
> Both you and JB are right about this case.
> However, what if the regex is ((.a.)*)* ?
>
Actually, in hindsight, my explanation is slightly wrong!

re.search and the others return None for an unmatched group, but
re.findall returns '' for an unmatched group, so instead of saying:

     Matches an empty string, returns ''

I should have said:

     The group doesn't match at all, so .findall returns ''

As for "((.a.)*)*", the inner group and repeat match like before, but
then the outer repeat and group try again.

The inner group can't match again, so it's unchanged (and it still
remembers the last successful capture), and the outer group therefore
matches an empty string.

Therefore the outer (first) group is always an empty string and the
inner (second) group is the same as the previous example (the last
capture or '' if no capture).



More information about the Python-list mailing list