[Python-bugs-list] [Bug #115900] difference between pre and sre for buggy expressions
noreply@sourceforge.net
noreply@sourceforge.net
Sun, 14 Jan 2001 13:09:57 -0800
Bug #115900, was updated on 2000-Oct-03 03:11
Here is a current snapshot of the bug.
Project: Python
Category: Regular Expressions
Status: Closed
Resolution: Fixed
Bug Group: None
Priority: 5
Submitted by: htrd
Assigned to : effbot
Summary: difference between pre and sre for buggy expressions
Details: def check_pattern(pattern):
import pre, sre
pre_version = pre.compile(pattern)
print pre_version.match('-1234a')
print pre_version.match(' 1234a')
sre_version = sre.compile(pattern)
print sre_version.match('-1234a')
print sre_version.match(' 1234a')
# This is a buggy re: It is trying to match a minus sign as part of a
# set of characters, but does not follow the documented rule of
# "precede it with a backslash, or place it as the first character"
# However, pre and sre behave differently, and neither behaviour is
# quite what I was expecting. Is this a hint of a bug?
check_pattern('([\s-])(\.?)' + r'([HLEr0123456789\s-])(\.?)'*4
+r'([abcd]?)')
# Preceeding them with a backslash makes pre and sre behave identically.
check_pattern('([\s\-])(\.?)' + r'([HLEr0123456789\s\-])(\.?)'*4
+r'([abcd]?)')
Follow-Ups:
Date: 2001-Jan-14 13:09
By: effbot
Comment:
part 1 (character class followed by hyphen) is same as bug #116251 (fixed
by amk 2000-10-07)
part 2 (bogus regexps) is fixed in current CVS. Will be in 2.1.
-------------------------------------------------------
Date: 2000-Oct-13 18:43
By: jdnier
Comment:
I've also run into a few bogus regexes that sre (NT4; python 2.0c1)
swallows but pre doesn't. These patterns are contrived but typos happen and
without a compile traceback the problem can be hard to discover. Just run
C:\Python20\Tools\Scripts\redemo.py, give it a string to search like
"aaaaaabbb" and enter, for example,
(?
(?)
a+?? <-shouldn't this raise 'nothing to repeat' error?
It matches a single 'a'; the second '?' is ignored
a+??? <-this matches the null string
a+???? <-this finally errors "nothing to repeat"
a??????????? <-never errors; matches null string
a++++++++ <-never errors; matches same as a+
It's the "matches the null string" part that seems evil, even if you allow
sre to be more permissive and not complain about the regex syntax errors.
Also, a* and permutations seem to work as expected, so it looks like
something is up with + and ? in particular.
Other novelties...
a?+?
a????+?
a?????+
a+???
a(?)????????????b
-------------------------------------------------------
For detailed info, follow this link:
http://sourceforge.net/bugs/?func=detailbug&bug_id=115900&group_id=5470