the buggy regex in Python
python at mrabarnett.plus.com
Thu Nov 25 18:18:24 CET 2010
On 25/11/2010 16:44, Yingjie Lan wrote:
> --- On Thu, 11/25/10, MRAB<python at mrabarnett.plus.com> wrote:
>> re.findall performs multiple searches, each starting where
>> the previous
>> one finished. The first match started at the start of the
>> string and
>> finished at its end. The second match started at that point
>> (the end of
>> the string) and found another match, ending at the end of
>> the string.
>> It tried to match a third time, but that failed because it
>> would have
>> matched an empty string again (it's not allowed to return 2
>> empty strings).
>>> Isn't this a bug?
>> No, but it can be confusing at times! :-)
> But the last empty string is matched twice -- so it is
> an overlapping. But findall is supposed not to return
> overlapping matches. So I think this does not live up
> to the documentation -- thus I still consider it a bug.
Look at the spans:
>>> for m in re.finditer('((.d.)*)*', 'adb'):
There's an non-empty match followed by an empty match.
More information about the Python-list