Python re module

Andrew M. Kuchling akuchlin at mems-exchange.org
Mon Jul 12 10:15:42 EDT 1999


catlee at my-deja.com
>Try the following:
>import re
>re.search("((.)\\1+)","a")
>Now, I know this isn't proper syntax (you shouldn't reference a group
>inside itself), but on my machine python hangs.

	What's happening is a bit complicated.  When you refer to a
group from inside itself, it's treated as if the group has matched the
null string.  You thus wind up asking "how many times does the null
string match at this point?"  The answer is infinity, of course.  

       However, GvR reports that it isn't actually hanging, just
taking a long time to finish.  I think it escapes the infinite loop
because \1+ is actually treated the same as \1{0,INT_MAX}, so once
it's counted up to INT_MAX (2**31-1) = 2.1 billion, it then returns.
     
     The latest version of PCRE, v2.06, does not have a problem with
this (more checks for repeated null-string matches have been added),
so I think the problem will be fixed when the Python module is
resynchronized with the latest version.

-- 
A.M. Kuchling			http://starship.python.net/crew/amk/
There is a difference between art and life and that difference is readability.
    -- Marian Engel, in the _Toronto Globe and Mail_, Dec. 28, 1974




More information about the Python-list mailing list