Extract all words that begin with x

Bryan bryanjugglercryptographer at yahoo.com
Wed May 12 02:55:39 EDT 2010


Terry Reedy wrote:
> Thank you for that timing report.

Enjoyed doing it, and more on that below.

> My main point is that there are two ways to fetch a char, the difference
> being the error return -- exception IndexError versus error value ''.
> This is an example of out-of-band versus in-band error/exception
> signaling, which programmers, especially of Python, should understand.

Sure. I think your posts and the bits of Python code they contain are
great. Slicing is amazingly useful, and it helps that slices don't
barf just because 'start' or 'stop' falls outside the index range.

My main point was to show off how Python and its standard library make
answering the which-is-faster question easy. I think that's another
thing Python programmers should understand, even though I just learned
how today.

Now back to the arguably-interesting issue of speed in the particular
problem here: 'Superpollo' had suggested another variant, which I
appended to my timeit targets, resulting in:

[s for s in strs if s.startswith('a')]  took:  5.68393977159
[s for s in strs if s[:1] == 'a']  took:  3.31676491502
[s for s in strs if s and s[0] == 'a']  took:  2.29392950076

Superpollo's condition -- s and s[0] == 'a' -- is the fastest of the
three.

What's more, in my timeit script the strings in the list are all of
length 5, so the 'and' never gets to short-circuit. If a major portion
of the strings are in fact empty superpollo's condition should do even
better. But I didn't test and time that. Yet.

-Bryan Olson


# ----- timeit code -----

from random import choice
from string import ascii_lowercase as letters
from timeit import Timer

strs = [''.join([choice(letters) for _ in range(5)])
        for _ in range(5000)]

way1 = "[s for s in strs if s.startswith('a')]"
way2 = "[s for s in strs if s[:1] == 'a']"
way3 = "[s for s in strs if s and s[0] == 'a']"

assert eval(way1) == eval(way2) == eval(way3)

for way in [way1, way2, way3]:
    t = Timer(way, 'from __main__ import strs')
    print(way, ' took: ', t.timeit(1000))



More information about the Python-list mailing list