[Python-bugs-list] [ python-Bugs-725106 ] SRE bug with capturing groups in alternatives in repeats

SourceForge.net noreply@sourceforge.net
Sun, 27 Apr 2003 06:29:05 -0700


Bugs item #725106, was opened at 2003-04-21 17:16
Message generated for change (Settings changed) made by niemeyer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725106&group_id=5470

Category: Regular Expressions
Group: Python 2.3
>Status: Closed
>Resolution: Accepted
Priority: 5
Submitted By: Greg Chapman (glchapman)
>Assigned to: Gustavo Niemeyer (niemeyer)
Summary: SRE bug with capturing groups in alternatives in repeats

Initial Comment:
SRE does not always correctly handle groups in 
alternatives in repeats.  For example:

>>> re.match('((a)|b)*', 'abc').groups()
('b', '')

Group 2 should obviously never be an empty string.  As I 
understand it, the rule for groups inside a repeat is that 
they should have the last value they matched during the 
iterations of the repeat (or None if they never match), so 
in the above case Group 2 should be 'a'.  To fix this, it 
appears that (when inside a repeat) the BRANCH 
opcode must call mark_save before trying an alternative 
and then call mark_restore if the alternative fails.  The 
attached patch does this.



----------------------------------------------------------------------

Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-04-27 12:35

Message:
Logged In: YES 
user_id=7887

Good catch Greg!

Just for reference, here are two tests to confirm that
you're right:

perl -e '"abc" =~ /^((a)|b)*/; print "$1 $2\n";'
echo "abc" | sed -r -e "s/^((a)|b)*/\1 \2|/"

The only change I made was to port your tests to test_re.py.

Applied as:

Modules/_sre.c: 2.94
Lib/test/test_re.py: 1.40

Thanks!

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725106&group_id=5470