Extract all words that begin with x
Stefan Behnel
stefan_ml at behnel.de
Wed May 12 04:29:23 EDT 2010
Bryan, 12.05.2010 08:55:
> Now back to the arguably-interesting issue of speed in the particular
> problem here: 'Superpollo' had suggested another variant, which I
> appended to my timeit targets, resulting in:
>
> [s for s in strs if s.startswith('a')] took: 5.68393977159
> [s for s in strs if s[:1] == 'a'] took: 3.31676491502
> [s for s in strs if s and s[0] == 'a'] took: 2.29392950076
>
> Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
> three.
Just out of curiosity, I ran the same code in the latest Cython pre-0.13
and added some optimised Cython implementations. Here's the code:
def cython_way0(l):
return [ s for s in l if s.startswith(u'a') ]
def cython_way1(list l):
cdef unicode s
return [ s for s in l if s.startswith(u'a') ]
def cython_way2(list l):
cdef unicode s
return [ s for s in l if s[:1] == u'a' ]
def cython_way3(list l):
cdef unicode s
return [ s for s in l if s[0] == u'a' ]
def cython_way4(list l):
cdef unicode s
return [ s for s in l if s and s[0] == u'a' ]
def cython_way5(list l):
cdef unicode s
return [ s for s in l if (<Py_UNICODE>s[0]) == u'a' ]
def cython_way6(list l):
cdef unicode s
return [ s for s in l if s and (<Py_UNICODE>s[0]) == u'a' ]
And here are the numbers (plain Python 2.6.5 first):
[s for s in strs if s.startswith(u'a')] took: 1.04618620872
[s for s in strs if s[:1] == u'a'] took: 0.518909931183
[s for s in strs if s and s[0] == u'a'] took: 0.617404937744
cython_way0(strs) took: 0.769457817078
cython_way1(strs) took: 0.0861849784851
cython_way2(strs) took: 0.208586931229
cython_way3(strs) took: 0.18615603447
cython_way4(strs) took: 0.190477132797
cython_way5(strs) took: 0.0366449356079
cython_way6(strs) took: 0.0368368625641
Personally, I think the cast to Py_UNICODE in the last two implementations
shouldn't be required, that should happen automatically, so that way3/4
runs equally fast as way5/6. I'll add that when I get to it.
Note that unicode.startswith() is optimised in Cython, so it's a pretty
fast option, too. Also note that the best speed-up here is only a factor of
14, so plain Python is quite competitive, unless the list is huge and this
is really a bottleneck in an application.
Stefan
More information about the Python-list
mailing list