[Python-bugs-list] [ python-Bugs-725149 ] SRE bugs with capturing groups in negative assertions

SourceForge.net noreply@sourceforge.net
Sun, 27 Apr 2003 06:28:05 -0700


Bugs item #725149, was opened at 2003-04-21 18:22
Message generated for change (Comment added) made by niemeyer
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470

Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Greg Chapman (glchapman)
>Assigned to: Gustavo Niemeyer (niemeyer)
Summary: SRE bugs with capturing groups in negative assertions

Initial Comment:
SRE is broken in some subtle ways when you combine 
capturing groups with assertions.  For example:

>>> re.match('((?!(a)c)[ab])*', 'abc').groups()
('b', '')

In the above '(a)' has matched an empty string.  Or 
worse:

>>> re.match('(a)((?!(b)*))*', 'abb').groups()
('b', None, None)

Here '(a)' matches 'b'.

Although Perl reports matches for groups in negative 
assertions, I think it is better to adopt the PCRE rule 
that these groups are always reported as unmatched 
outside the assertion (inside the assertion, if used with 
backreferences, they should behave as normal).  This 
would make the handling of subpatterns in negative 
assertions consistent with that of subpatterns in 
branches:

>>> re.match('(a)c|ab', 'ab').groups()
(None,)

In the above, although '(a)' matches before the branch 
fails, the failure of the branch means '(a)' is considered 
not to have matched.

Anyway, the attached patch is an effort to fix this 
problem by saving the values of marks before calling the 
assertion, and then restoring them afterwards (thus 
undoing whatever might have been done in the assertion).


----------------------------------------------------------------------

>Comment By: Gustavo Niemeyer (niemeyer)
Date: 2003-04-27 13:28

Message:
Logged In: YES 
user_id=7887

Greg, I think there are two different issues here.

One of them is related to a wrong behavior from
mark_save/restore(), which don't restore the stackbase
before restoring the marks. Asserts were afected because
they're the only recursive ops that can continue the loop,
but the problem would happen to any operation with the same
behavior. So, rather than hardcoding this into asserts, I
have changed mark_save/restore() to always restore the
stackbase before restoring the marks. This should fix these
two cases you presented:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
>>> re.match('(a)((?!(b)*))*', 'abb').groups()

And was applied as:

Modules/_sre.c: 2.95
Lib/test/test_re.py: 1.41

The other issue is related to the asserts which are leaving
half-marked groups. While your solution does work, it
changes the current behavior, which is also compatible to
how perl works. I understand that individual groups matching
when the whole string doesn't match is atypical. OTOH, a
negative assertion *is* atypical, and IMO denying external
group access won't help the user to understand how it works.
In other words, I think it's better to have incomplete
support for the moment, than having none.
This way, we can think further about this, and look for an
elegant solution to fix that support, certainly including
some algorithm to check for half-marked groups.

Thank you very much for spotting these bugs, and submitting
a solution for them.


----------------------------------------------------------------------

Comment By: Greg Chapman (glchapman)
Date: 2003-04-22 18:46

Message:
Logged In: YES 
user_id=86307

In thinking further, I realized that positive assertions are also 
affected by the second problem.  E.g.:

>>> re.match('(a)(?:(?=(b)*)c)*', 'abb').groups()
('b', None)

The problem here is that a successful match in an assertion 
can leave marks at the top of the mark stack which then get 
popped in the wrong place.  Attaching a new patch which 
should catch this problem for both kinds of assertions (and 
which also should "unmark" groups in negative assertions).


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=725149&group_id=5470