Must be a bug in the re module [was: Why this result with the re module]
Yingjie Lan
lanyjie at yahoo.com
Tue Nov 2 23:28:09 EDT 2010
> Your regex says "Zero or more consecutive occurrences of
> something, always returning the most possible". That's
> what it does, at every position - only matching emptyness
> where it couldn't match anything (findall then skips a
> character to avoid overlapping/infinite empty
> matches), and at all other times matching the most
> possible (eg. "has a lam" not "has", " a ", "lam").
You are about to convince me now.
You are correct for the regex '(.a.)*'.
What I thought was for this regex: '((.a.)*)*',
I confused myself when I added an enclosing ().
Could you please reconsider how would you
work with this new one and see if my steps
are correct? If you agree with my 7-step
execution for the new regex, then:
We finally found a real bug for re.findall:
>>> re.findall('((.a.)*)*', 'Mary has a lamb')
[('', 'Mar'), ('', ''), ('', ''), ('', 'lam'), ('', ''), ('', '')]
Cheers,
Yingjie
More information about the Python-list
mailing list