[ python-Bugs-786970 ] re doesn't like (^$)*
SourceForge.net
noreply at sourceforge.net
Fri Jul 30 17:04:26 CEST 2004
Bugs item #786970, was opened at 2003-08-11 15:21
Message generated for change (Comment added) made by mkc
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=786970&group_id=5470
Category: Regular Expressions
Group: Python 2.3
Status: Open
Resolution: None
Priority: 5
Submitted By: Andrew Dalke (dalke)
Assigned to: Fredrik Lundh (effbot)
Summary: re doesn't like (^$)*
Initial Comment:
Nor, for that matter, does it like "(^)*"
% python
Python 2.3 (#1, Aug 3 2003, 02:47:49)
[GCC 3.1 20020420 (prerelease)] on darwin
>>> import re
>>> re.compile("(^$)*").match("")
Segmentation fault
%
It's trying real hard to match 0 characters an infinite
number of time. :)
The segfault is caused in part by the low stacksize limit
on my OS X machine,
% limit stacksize
stacksize 512 kbytes
% limit stacksize 2000kbytes
% limit stacksize
stacksize 2000 kbytes
% python
Python 2.3 (#1, Aug 3 2003, 02:47:49)
[GCC 3.1 20020420 (prerelease)] on darwin
Type "help", "copyright", "credits" or "license" for more
information.
>>> import re
>>> re.compile("(^$)*").match("")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
RuntimeError: maximum recursion limit exceeded
>>>
which suggests that the stack recursion limit
test for the re library is not the same as the one
used for the rest of Python. (def f(): f() gives
me the expected recursion limit, and not a
segfault)
Seems like the bug could be in several places:
- the compiler doesn't handle infinite loops of
zero-character tests well (it could convert
them to a finite-loop test)
- the re matcher doesn't check that it's been
in the same place several times without
advancing any character positions
- the re matcher doesn't use the same stack
check used elsewhere in Python
- the Mac stacksize default is too low for
Python's
BTW, checking pcre ...
>>> import pre
/usr/local/lib/python2.3/pre.py:94: DeprecationWarning:
Please use the 're' module, not the 'pre' module
DeprecationWarning)
>>> pre.compile("(^$)*").match("")
Traceback (most recent call last):
File "<stdin>", line 1, in ?
File "/usr/local/lib/python2.3/pre.py", line 251, in compile
code=pcre_compile(pattern, flags, groupindex)
pcre.error: ('operand of unlimited repeat could match the
empty string', 4)
>>>
which is true, but the pattern I used should (IMHO)
be allowed to match the empty string.
----------------------------------------------------------------------
Comment By: Mike Coleman (mkc)
Date: 2004-07-30 10:04
Message:
Logged In: YES
user_id=555
I was able to reproduce this under Linux (by setting the
stack limit to 512k) under Python 2.3.2. The specific case
seems to be fixed in the current CVS head; I don't get
problems until I reduce the stack limit below 64k, at which
point the import of 're' fails. So, to the degree that this
was caused by heavy use of the C stack, perhaps this has
been fixed.
As the submitter suggests, the limit being bumped into here
is not the Python recursion limit, but the underlying C
stack limit. AFAIK, the Python recursion limit is not
checked within C modules (and I doubt it would be reasonable
to add this).
In principle it seems like it would be nice if Python could
throw a MemoryException when the C stack limit is exceeded,
but it's not clear how this could be done or whether it
would be worthwhile. Guido is already on record (I think)
as being against longjmp, which seems like the only portable
way to implement it. It might be possible to emit a short
diagnostic (e.g., 'stack limit exceeded') before aborting,
but this would entail adding some sigaltstack mechanism that
probably wouldn't be portable to non-POSIX and might well
make debugging real SIGSEGV's or runaway C recursion more
difficult.
My suggestion would be to close this bug, perhaps after
adding a bit of documentation regarding how Python handles
this case (i.e., it relies on the default behavior, like 99%
of other programs). If there's a desire to add
functionality to handle the stack limit in a different way,
that could be proposed on python-dev or as an RFE.
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=786970&group_id=5470
More information about the Python-bugs-list
mailing list