Why this result with the re module
lists at asd-group.com
Tue Nov 2 08:50:35 CET 2010
On 2/11/2010 7:00 AM, Yingjie Lan wrote:
>>> re.findall('(.a.)*',' ') #two spaces
> ['', '', '']
> I must need more details of the matching algorithm to explain this?
Sorry - I hit enter prematurely on my last message.
To take the above as an example (all your examples boil down to the same
issue), you're asking findall to look for all occurances of something
that can exist ZERO or more times, in a string where it doesn't actually
exist anywhere. So you get three matches of zero occurrences each - one
before the first space, one between the two spaces, and one after the
last space. An empty string (indicating that the match consumed no
text) is returned for each. The spaces themselves don't match because
they aren't zero or more occurrences of '.a.', so they are skipped.
You might wonder why something that can match no input text, doesn't
return an infinite number of those matches at every possible position,
but they would be overlapping, and findall explicitly says matches have
to be non-overlapping.
More information about the Python-list