[Python-bugs-list] [ python-Bugs-832946 ] re.finditer() hangs with
some re involving \[
SourceForge.net
noreply at sourceforge.net
Thu Oct 30 11:30:52 EST 2003
Bugs item #832946, was opened at 2003-10-30 05:05
Message generated for change (Settings changed) made by nnorwitz
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=832946&group_id=5470
Category: Regular Expressions
Group: Python 2.3
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Angel Perea Martinez (angelpeream)
Assigned to: Fredrik Lundh (effbot)
Summary: re.finditer() hangs with some re involving \[
Initial Comment:
When called with some parameters (see test below),
re.finditer() enters an infinite loop.
Note that as far as I know, this is not bug 817234. In
this case, finditer() does NOT return anything: symply
hangs.
I work in a W2000 environment, with a fresh-installed
python v2.3.2 . That is:
sys.version_info() = (2, 3, 2, 'final', 0)
To reproduce it:
---------------
import re
pattern = re.compile(r"\[([^][]+)+\]")
text = "[ xxxxx , xxxxxx , xxxxxx"
for n in re.finditer(pattern, text):
print "this string will not appear"
print n.group(0)
----------
I could'nt further simplify the pattern, neither the text.
----------------------------------------------------------------------
Comment By: Neal Norwitz (nnorwitz)
Date: 2003-10-30 11:28
Message:
Logged In: YES
user_id=33168
Are you sure it's an infinite loop? It takes a while to
complete on my 4 year old box (Linux), but it does complete.
How long did you let this run?
----------------------------------------------------------------------
Comment By: Fredrik Lundh (effbot)
Date: 2003-10-30 11:08
Message:
Logged In: YES
user_id=38376
You're using an RE with nested repeat operators (+). This
forces the engine to check all all possible combinations of
of the inner and outer repeat before it can be sure that no
possible combination results in a match.
To see this in action, run this script:
import re, time
pattern = re.compile(r"\[([^][]+)+\]")
text = "[ xxxxx , xxxxxx , xxxxxx"
for i in range(len(text)):
t0 = time.time()
print i, re.findall(pattern, text[:i]),
print time.time() - t0
on a relatively slow PC, this prints
0 [] 0.0
1 [] 0.0
2 [] 0.0
3 [] 0.0
4 [] 0.0
5 [] 0.0
6 [] 0.0
7 [] 0.00999999046326
8 [] 0.0
9 [] 0.0
10 [] 0.0
11 [] 0.00999999046326
12 [] 0.0
13 [] 0.00999999046326
14 [] 0.0199999809265
15 [] 0.0400000810623
16 [] 0.0899999141693
17 [] 0.151000022888
18 [] 0.299999952316
19 [] 0.651000022888
20 [] 1.22200000286
21 [] 2.51300001144
22 [] 5.10800004005
23 [] 11.8769999743
24 [] 21.9309999943
----------------------------------------------------------------------
You can respond by visiting:
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=832946&group_id=5470
More information about the Python-bugs-list
mailing list