[New-bugs-announce] [issue10532] A bug related to matching the empty string

Yingjie report at bugs.python.org
Thu Nov 25 17:39:15 CET 2010


New submission from Yingjie <lanyjie at yahoo.com>:

Here are some puzzling results I have got (I am using Python 3, I suppose similar results for python 2).

When I do the following, I got an exception:
>>> re.findall('(d*)*', 'adb')
>>> re.findall('((d)*)*', 'adb')

When I do this, I am fine but the result is wrong:
>>> re.findall('((.d.)*)*', 'adb')
[('', 'adb'), ('', '')]

Why is it wrong?

The first mactch of groups:
('', 'adb')
indicates the outer group ((.d.)*) captured
the empty string, while the inner group (.d.)
captured 'adb', so the outer group must have
captured the empty string at the end of the
provided string 'adb'.

Once we have matched the final empty string '',
there should be no more matches, but we got
another match ('', '')!!!

So, findall matched the empty string in
the end of the string twice!!!

Isn't this a bug?

Yingjie

----------
components: Regular Expressions
messages: 122380
nosy: lanyjie
priority: normal
severity: normal
status: open
title: A bug related to matching the empty string
versions: Python 3.2

_______________________________________
Python tracker <report at bugs.python.org>
<http://bugs.python.org/issue10532>
_______________________________________


More information about the New-bugs-announce mailing list