[Python-bugs-list] [ python-Bugs-576079 ] Inconsistent behaviour in re grouping
noreply@sourceforge.net
noreply@sourceforge.net
Tue, 09 Jul 2002 00:20:52 -0700
Bugs item #576079, was opened at 2002-07-01 20:22
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=576079&group_id=5470
Category: Regular Expressions
Group: None
Status: Open
Resolution: None
Priority: 5
Submitted By: Pedro Rodriguez (pedro_rodriguez)
Assigned to: Fredrik Lundh (effbot)
Summary: Inconsistent behaviour in re grouping
Initial Comment:
The following expression (?P<name>.*) and
(?P<name>(.*)) don't behave in the same way.
When the matching fails, the first group will be None,
but the last one will contain an empty string.
The problem occurs with python 2.1.1 and 2.2. (and
latest CVS for 2.3)
Python 1.5.2 OTH works fine.
(example file attached)
----------------------------------------------------------------------
>Comment By: Pedro Rodriguez (pedro_rodriguez)
Date: 2002-07-09 09:20
Message:
Logged In: YES
user_id=426450
Greg,
I tried your patch and it fixes the problems as Tim and
myself reported them.
If each time the 'lastmark' field is set to a value which is
less than the current one, the 'memset' operation should be
performed, your patch makes sense IMO, and is consistent
with other places in the code.
----------------------------------------------------------------------
Comment By: Greg Chapman (glchapman)
Date: 2002-07-07 18:09
Message:
Logged In: YES
user_id=86307
I believe this is another example of the bug at which Patch
527371 was aimed. With that patch applied to the 2.2.1
_sre.c, I get this:
>>> pat2 = re.compile(r"(((.*))x)?(y)")
>>> print pat2.match('y').groups()
(None, None, None, 'y')
I see that the patch is marked as accepted, but it does not
yet appear to have made it into _sre.c even in CVS (at least
it's not in version 2.80). Perhaps this is an oversight?
----------------------------------------------------------------------
Comment By: Tim Peters (tim_one)
Date: 2002-07-01 20:36
Message:
Logged In: YES
user_id=31435
Here's a simpler example:
import re
pat1 = re.compile(r"((.*)x)?(y)")
pat2 = re.compile(r"(((.*))x)?(y)")
print pat1.match('y').groups()
print pat2.match('y').groups()
That prints
(None, None, 'y')
(None, '', None, 'y')
If (y) in the regexps is changed to plain y:
pat1 = re.compile(r"((.*)x)?y")
pat2 = re.compile(r"(((.*))x)?y")
print pat1.match('y').groups()
print pat2.match('y').groups()
the output changes to (the expected):
(None, None)
(None, None, None)
So it's not *just* the extra level of parens -- whether there's
a capturing group "to the right" also affects the outcome.
FWIW, I agree it's a buglet.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=576079&group_id=5470