Must be a bug in the re module [was: Why this result with the re module]

Chris Melville pub at melv.org
Tue Nov 2 22:30:36 EDT 2010


> Disagree in this case, where the whole regex
> matches an empty string. Greadiness will match
> as much as possible. So it will also match
> the empty strings between consecutive
> characters as much as possible, once
> we have properly defined all the unique
> empty strings. Because of greadiness,
> fewer matches should be found. In this
> case, it should find only 4 matches
> (shown in my steps) instead of 6 matches
> (shown in your steps).
>
> Yingjie
>
I'm sorry but I really don't understand where your coming from.

Your regex says "Zero or more consecutive occurrences of something, 
always returning the most possible".  That's what it does, at every 
position - only matching emptyness where it couldn't match anything 
(findall then skips a character to avoid overlapping/infinite empty 
matches),  and at all other times matching the most possible (eg. "has a 
lam" not "has", " a ", "lam").

Perhaps someone else can look at this and comment on whether findall is 
doing what it is supposed to, or not. To me, it is.

Cheers, JB



More information about the Python-list mailing list