making a valid file name...

Neil Cerutti horpner at
Wed Oct 18 22:05:26 EDT 2006

On 2006-10-18, bearophileHUGS at <bearophileHUGS at> wrote:
> Tim Chase:
>> In practice, however, for such small strings as the given
>> whitelist, the underlying find() operation likely doesn't put a
>> blip on the radar.  If your whitelist were some huge document
>> that you were searching repeatedly, it could have worse
>> performance.  Additionally, the find() in the underlying C code
>> is likely about as bare-metal as it gets, whereas the set
>> membership aspect of things may go through some more convoluted
>> setup/teardown/hashing and spend a lot more time further from the
>> processor's op-codes.
> With this specific test (half good half bad), on Py2.5, on my PC, sets
> start to be faster than the string search when the string "good" is
> about 5-6 chars long (this means set are quite fast, I presume).
> from random import choice, seed
> from time import clock
> def main(choice=choice):
>     seed(1)
>     n = 100000
>     for good in ("ab", "abc", "abcdef", "abcdefgh",
>                  "abcdefghijklmnopqrstuvwxyz"):
>         poss = good + good.upper()
>         data = [choice(poss) for _ in xrange(n)] * 10
>         print "len(good) = ", len(good)
>         t = clock()
>         for c in data:
>             c in good
>         print round(clock()-t, 2)
>         t = clock()
>         sgood = set(good)
>         for c in data:
>             c in sgood
>         print round(clock()-t, 2), "\n"
> main()

On my Python2.4 for Windows, they are often still neck-and-neck
for len(good) = 26. set's disadvantage of having to be
constructed is heavily amortized over 100,000 membership
tests. Without knowing the usage pattern, it'd be hard to choose
between them.

Neil Cerutti

More information about the Python-list mailing list