String search vs regexp search
Duncan Booth
duncan at NOSPAMrcp.co.uk
Wed Oct 15 04:14:29 EDT 2003
Alex Martelli <aleax at aleax.it> wrote in
news:pCVib.282714$R32.9272822 at news2.tin.it:
> Jeremy Fincher wrote:
>
>> Duncan Booth <duncan at NOSPAMrcp.co.uk> wrote in message
>> news:<Xns941462CDB5003duncanrcpcouk at 127.0.0.1>...
>>> Ok, found the code. Regular expression searches do indeed use a form
>>> of Boyer-Moore, but not if you are ignoring case. So by specifying
>>> re.IGNORECASE the OP got a double hit, not only does the code have
>>> to do case insensitive comparisons, but it also has to crawl along
>>> looking at every character in the search string instead of skipping
>>> most of them.
>>
>> That's cool! Where'd you find the code?
>
> Hmmm, dist/src/Modules/_sre.c in the Python CVS tree? Or
> Modules/_sre.c in a standard source distribution?
>
Close, but it is actually a little bit complicated. _sre.c has the code
that does the search, including the skipping forward using an overlap
table, but the bit which checks for a literal prefix and builds the overlap
table is in lib/src_compile.py. So its in the Python code you have to look
to find that it ignores literal prefixes when ignoring case.
--
Duncan Booth duncan at rcp.co.uk
int month(char *p){return(124864/((p[0]+p[1]-p[2]&0x1f)+1)%12)["\5\x8\3"
"\6\7\xb\1\x9\xa\2\0\4"];} // Who said my code was obscure?
More information about the Python-list
mailing list