String search vs regexp search

Tue Oct 14 04:34:34 EDT 2003

tweedgeezer at hotmail.com (Jeremy Fincher) wrote in
news:698f09f8.0310132019.4bd918b2 at posting.google.com: 

> Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote in message
> news:<Xns941360C9B9445duncanrcpcouk at 127.0.0.1>... 
>> The regular expression code has a startup penalty since it has to
>> compile the regular expression at least once, however the actual
>> searching may be faster than the naive str.find. If the time spent
>> doing the search is sufficiently long compared with the time doing
>> the compile, the regular expression may win out.
> 
> Both regular expression searching and string.find will do searching
> one character at a time; given that, it seems impossible to me that
> the hand-coded-in-C "naive" string.find could be slower than the
> machine-translated-coded-in-Python regular expression search. 
> Compilation time only serves to further increase string.find's
> advantage.
> 
I may have misremembered, but I thought there was a thread discussing this 
a little while back which claimed that the regular expression library 
looked for constant strings at the start of the regex, and if it found one 
used Boyer-Moore to do the search. If it does, then regular expressions 
searching for a constant string certainly ought to be much faster than a 
plain string.find (as the length of the searched string tends towards 
infinity).

If it doesn't, then it should.

-- 
Duncan Booth                                             duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?