[Python-bugs-list] [ python-Bugs-832946 ] re.finditer() hangs with some re involving \[

SourceForge.net noreply at sourceforge.net
Thu Oct 30 11:30:52 EST 2003


Bugs item #832946, was opened at 2003-10-30 05:05
Message generated for change (Settings changed) made by nnorwitz
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=832946&group_id=5470

Category: Regular Expressions
Group: Python 2.3
>Status: Closed
>Resolution: Invalid
Priority: 5
Submitted By: Angel Perea Martinez (angelpeream)
Assigned to: Fredrik Lundh (effbot)
Summary: re.finditer() hangs with some re involving \[

Initial Comment:
When called with some parameters (see test below),
re.finditer() enters an infinite loop.

Note that as far as I know, this is not bug 817234.  In
this case, finditer()  does NOT return anything: symply
hangs.

I work in a W2000 environment, with a fresh-installed
python v2.3.2 . That is:

sys.version_info() =  (2, 3, 2, 'final', 0)

To reproduce it:

---------------
import re

pattern = re.compile(r"\[([^][]+)+\]")

text = "[ xxxxx , xxxxxx , xxxxxx"

for n in re.finditer(pattern, text):
    print "this string will not appear"
    print n.group(0)

----------

I could'nt further simplify the pattern, neither the text.



----------------------------------------------------------------------

Comment By: Neal Norwitz (nnorwitz)
Date: 2003-10-30 11:28

Message:
Logged In: YES 
user_id=33168

Are you sure it's an infinite loop?  It takes a while to
complete on my 4 year old box (Linux), but it does complete.
 How long did you let this run?

----------------------------------------------------------------------

Comment By: Fredrik Lundh (effbot)
Date: 2003-10-30 11:08

Message:
Logged In: YES 
user_id=38376

You're using an RE with nested repeat operators (+).   This 
forces the engine to check all all possible combinations of 
of the inner and outer repeat before it can be sure that no 
possible combination results in a match.

To see this in action, run this script:

import re, time

pattern = re.compile(r"\[([^][]+)+\]")

text = "[ xxxxx , xxxxxx , xxxxxx"

for i in range(len(text)):
    t0 = time.time()
    print i, re.findall(pattern, text[:i]),
    print time.time() - t0

on a relatively slow PC, this prints 

0 [] 0.0
1 [] 0.0
2 [] 0.0
3 [] 0.0
4 [] 0.0
5 [] 0.0
6 [] 0.0
7 [] 0.00999999046326
8 [] 0.0
9 [] 0.0
10 [] 0.0
11 [] 0.00999999046326
12 [] 0.0
13 [] 0.00999999046326
14 [] 0.0199999809265
15 [] 0.0400000810623
16 [] 0.0899999141693
17 [] 0.151000022888
18 [] 0.299999952316
19 [] 0.651000022888
20 [] 1.22200000286
21 [] 2.51300001144
22 [] 5.10800004005
23 [] 11.8769999743
24 [] 21.9309999943


----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=832946&group_id=5470



More information about the Python-bugs-list mailing list