[Python-bugs-list] [Bug #115900] difference between pre and sre for buggy expressions

noreply@sourceforge.net noreply@sourceforge.net
Sun, 14 Jan 2001 13:09:57 -0800


Bug #115900, was updated on 2000-Oct-03 03:11
Here is a current snapshot of the bug.

Project: Python
Category: Regular Expressions
Status: Closed
Resolution: Fixed
Bug Group: None
Priority: 5
Submitted by: htrd
Assigned to : effbot
Summary: difference between pre and sre for buggy expressions

Details: def check_pattern(pattern):
    import pre, sre

    pre_version = pre.compile(pattern)

    print pre_version.match('-1234a')
    print pre_version.match(' 1234a')

    sre_version = sre.compile(pattern)

    print sre_version.match('-1234a')
    print sre_version.match(' 1234a')

# This is a buggy re: It is trying to match a minus sign as part of a
# set of characters, but does not follow the documented rule of
# "precede it with a backslash, or place it as the first character"
# However, pre and sre behave differently, and neither behaviour is
# quite what I was expecting. Is this a hint of a bug?
check_pattern('([\s-])(\.?)'  + r'([HLEr0123456789\s-])(\.?)'*4
+r'([abcd]?)')

# Preceeding them with a backslash makes pre and sre behave identically.
check_pattern('([\s\-])(\.?)'  + r'([HLEr0123456789\s\-])(\.?)'*4
+r'([abcd]?)')


Follow-Ups:

Date: 2001-Jan-14 13:09
By: effbot

Comment:
part 1 (character class followed by hyphen) is same as bug #116251 (fixed
by amk 2000-10-07)

part 2 (bogus regexps) is fixed in current CVS.  Will be in 2.1.
-------------------------------------------------------

Date: 2000-Oct-13 18:43
By: jdnier

Comment:
I've also run into a few bogus regexes that sre (NT4; python 2.0c1)
swallows but pre doesn't. These patterns are contrived but typos happen and
without a compile traceback the problem can be hard to discover. Just run
C:\Python20\Tools\Scripts\redemo.py, give it a string to search like
"aaaaaabbb" and enter, for example,

(?
(?)
a+??    <-shouldn't this raise 'nothing to repeat' error?
            It matches a single 'a'; the second '?' is ignored
a+???   <-this matches the null string
a+????  <-this finally errors "nothing to repeat"
a???????????   <-never errors; matches null string
a++++++++   <-never errors; matches same as a+

It's the "matches the null string" part that seems evil, even if you allow
sre to be more permissive and not complain about the regex syntax errors.
Also, a* and permutations seem to work as expected, so it looks like
something is up with + and ? in particular.

Other novelties...

a?+?
a????+?
a?????+
a+???
a(?)????????????b


-------------------------------------------------------

For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=115900&group_id=5470