What algorithm does Python use to evaluate: if substring in string
Alex Martelli
aleax at mac.com
Sat Sep 9 13:25:35 EDT 2006
Tor Erik <torerik81 at gmail.com> wrote:
> Alex Martelli wrote:
> > Tor Erik <torerik81 at gmail.com> wrote:
> >
> >> I would be surprised if it is the naive:
> >
> > Yep -- it's "a mix between Boyer-Moore and Horspool with a few more
> > bells and whistles on the top", as documented and implemented in
> > Objects/stringlib/fastsearch.h in the Python sources and well discussed
> > and explained at http://effbot.org/zone/stringlib.htm .
> >
> >
> > Alex
>
> Ok. Two questions:
>
> 1. Is "a in b" simply an alias for "b.find(a)"?
The 'in' operator can be minutely better optimized, but they share the
underlying algorithm (in 2.5).
> 2. Is this algorithm exclusive to Python 2.5, or is it contained in 2.4
> aswell?
It's 2.5 novelty. Look at the performance on the same machine (my 2.0
GHz MBP, MacOSX 10.4.7):
brain:~ alex$ python2.4 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77' 'x in
y'
100000 loops, best of 3: 9.04 usec per loop
brain:~ alex$ python2.4 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77'
'y.find(x)!=-1'
100000 loops, best of 3: 2.01 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77' 'x in
y'1000000 loops, best of 3: 0.452 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo";y="bar"*99+x+"baz"*77'
'y.find(x)!=-1'
1000000 loops, best of 3: 0.842 usec per loop
find used to be way faster than 'in' -- now they share algorithms and
'in' can be more optimized (no need to track ``where'' it finds a match,
so to speak;-), so find is over twice as fast as it used to be, 'in' is
about 20 times as fast as it used to be, in this example -- it gets even
better if you look at larger and larger strings, e.g...:
brain:~ alex$ python2.4 -mtimeit -s'x="foo"*123;y="bar"*999+x+"baz"*777'
'x in y'
10000 loops, best of 3: 91.9 usec per loop
brain:~ alex$ python2.5 -mtimeit -s'x="foo"*123;y="bar"*999+x+"baz"*777'
'x in y'
100000 loops, best of 3: 3.01 usec per loop
here, we're going _30_ times as fast, not "just" 20;-).
Alex
More information about the Python-list
mailing list