regex searching hangs
fortepianissimo at yahoo.com.tw
Sun Feb 8 20:29:35 CET 2004
--- Tim Peters <tim.one at comcast.net> wrote:
> > Could someone explains why the following code hangs (Python 2.3.3)?
> Yes, but it's not hung, and the answer is involved. Jeffrey Friedl
> wrote a
> book about it, "Mastering Regular Expressions" (O'Reilly). Chapter 4
> available (free) online, and is relevant:
Good pointer - guess this is new stuff added in the second ed...
> > --- CODE STARTS ---
> > import re
> > p=re.compile(r'\S+(?:\.\S+)+\.com')
> > t='......................................'
> > p.search(t)
> > --- CODE ENDS ---
> Run this instead, and you should see that the failure to match takes
> exponential in the number of periods (add a period, and the time
> (... )
> Semantically, the nested loop
> part matches every string of length >= 2 that starts with a period
> and doesn't contain whitespace.
Yup I now see the problem. This coupled with the prefix "\S+" forces
Python to "start" matching "(?:\.\S+)+" from any position, and the
trailing "\.com" forces it to "stop" matching "(?:\.\S+)+" at any
position and to start matching "\.com". Removing "\.com" greatly speeds
up the process. In short, this gives gozillion possibilities for
> > is halting problem a problem here?
> Not in theory, but in practice a match attempt that takes longer than
> age of the universe may be a minor annoyance <wink>.
It would be nice if a "cut-off" value can be passed in so as long as #
of branches reaches the value, the search stops and returns something
to indicate a pre-mature termination.
Do you Yahoo!?
Yahoo! Finance: Get your refund fast by filing online.
More information about the Python-list