re bug

Klaus Neuner klaus_neuner82 at yahoo.de
Tue Oct 5 10:57:10 CEST 2004


Hello,

it seems that there is a bug in the re module. Below there is a sample
program that illustrates it.(Sorry for its length. I was not able to
write a simpler regex that produced the same effect.)

The regex rx compiles. It can be used to search the strings str1 and
str2. Yet, when used with str3, the program will not terminate for
days.

I had already had this problem some months ago in another program. I
had the program run for several days. In the end, the regex search
terminated and returned the right result. As I could avoid regular
expressions of the kind that led to the problem, I did avoid them.

At the time being, I can neither avoid these regular expressions, nor
can I afford to wait days for the result of my program. And, even if
the search on str3 would terminate some day, I would still consider
this behaviour a serious bug. I am searching tons of texts with
regular expressions. And in tons of texts there will always be
something of the type of str3. This means that my program just doesn't
terminate, although it has no (home made) bug.

I am REALLY reluctant to change my program, because it is very complex
and it runs fine apart from the regex-search-does-not-terminate
problem. But, if changing my program was the most efficient way to
make the problem go away, I would do it. (I cannot afford to wait for
some future version of the re module that doesn't have the problem.)

Is there perhaps a way to tell Python something like 

"Try the search for 5 minutes. If you don't find anything, continue."

Or are there any other solutions?


Klaus



#########################################################################

import re, sys

rx = re.compile("(?:^| )(?P<ALLES>(?:[^/ ]*/[^/ ]*/(?:cn(?::[^/ #]+)*)
)*[^/ ]*/[^/ ]*/(?:cn(?::[^: /#]+)*:2(?::[^: /#]+)*:3(?::[^:
/#]+)*))(?=( |$))")


str1 = "mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:2:3"

test1 = rx.search(str1)
if test1:
    print test1.group("ALLES")

str2 = "//bos Innit/no/no ?/?/pun:? Mm/mm/adj:4 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:2:3 ././pun:."

test2 = rx.search(str2)

if test2:
    print test2.group("ALLES")

# Part below will not terminate for days

# str3 = "//bos Innit/no/no ?/?/pun:? Mm/mm/adj:4 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3
mm/mm/cn:nom:akk:3 mm/mm/cn:nom:akk:3 ././pun:."

# test3 = rx.search(str3)

# if test3:
#     print test3.group("ALLES")



More information about the Python-list mailing list