What algorithm does Python use to evaluate: if substring in string

Alex Martelli aleax at mac.com
Sat Sep 9 19:25:35 CEST 2006


Tor Erik <torerik81 at gmail.com> wrote:

> Alex Martelli wrote:
> > Tor Erik <torerik81 at gmail.com> wrote:
> > 
> >> I would be surprised if it is the naive:
> > 
> > Yep -- it's "a mix between Boyer-Moore and Horspool with a few more
> > bells and whistles on the top", as documented and implemented in
> > Objects/stringlib/fastsearch.h in the Python sources and well discussed
> > and explained at http://effbot.org/zone/stringlib.htm .
> > 
> > 
> > Alex
> 
> Ok. Two questions:
> 
> 1. Is "a in b" simply an alias for "b.find(a)"?

The 'in' operator can be minutely better optimized, but they share the
underlying algorithm (in 2.5).

> 2. Is this algorithm exclusive to Python 2.5, or is it contained in 2.4
> aswell?

It's  2.5 novelty.  Look at the performance on the same machine (my 2.0
GHz MBP, MacOSX 10.4.7):

brain:~ alex$ python2.4 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77' 'x in
y'
100000 loops, best of 3: 9.04 usec per loop
brain:~ alex$ python2.4 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77'
'y.find(x)!=-1'
100000 loops, best of 3: 2.01 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77' 'x in
y'1000000 loops, best of 3: 0.452 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77'
'y.find(x)!=-1'
1000000 loops, best of 3: 0.842 usec per loop

find used to be way faster than 'in' -- now they share algorithms and
'in' can be more optimized (no need to track ``where'' it finds a match,
so to speak;-), so find is over twice as fast as it used to be, 'in' is
about 20 times as fast as it used to be, in this example -- it gets even
better if you look at larger and larger strings, e.g...:

brain:~ alex$ python2.4 -mtimeit -s'x="foo"*123;y="bar"*999+x+"baz"*777'
'x in y'
10000 loops, best of 3: 91.9 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo"*123;y="bar"*999+x+"baz"*777'
'x in y'
100000 loops, best of 3: 3.01 usec per loop

here, we're going _30_ times as fast, not "just" 20;-).


Alex



More information about the Python-list mailing list