Why this result with the re module

John Bond lists at asd-group.com
Tue Nov 2 08:50:35 CET 2010


On 2/11/2010 7:00 AM, Yingjie Lan wrote:
>>> re.findall('(.a.)*','  ') #two spaces
> ['', '', '']
> I must need more details of the matching algorithm to explain this?
>
> Regards,
>
> Yingjie
>
>
>
>
Sorry - I hit enter prematurely on my last message.

To take the above as an example (all your examples boil down to the same 
issue),  you're asking findall to look for all occurances of something 
that can exist ZERO or more times, in a string where it doesn't actually 
exist anywhere. So you get three matches of zero occurrences each - one 
before the first space, one between the two spaces, and one after the 
last space.  An empty string (indicating that the match consumed no 
text) is returned for each. The spaces themselves don't match because 
they aren't zero or more occurrences of '.a.', so they are skipped.

You might wonder why something that can match no input text, doesn't 
return an infinite number of those matches at every possible position, 
but they would be overlapping, and findall explicitly says matches have 
to be non-overlapping.

Cheers, JB








More information about the Python-list mailing list