making a valid file name...
Neil Cerutti
horpner at yahoo.com
Wed Oct 18 22:05:26 EDT 2006
On 2006-10-18, bearophileHUGS at lycos.com <bearophileHUGS at lycos.com> wrote:
> Tim Chase:
>> In practice, however, for such small strings as the given
>> whitelist, the underlying find() operation likely doesn't put a
>> blip on the radar. If your whitelist were some huge document
>> that you were searching repeatedly, it could have worse
>> performance. Additionally, the find() in the underlying C code
>> is likely about as bare-metal as it gets, whereas the set
>> membership aspect of things may go through some more convoluted
>> setup/teardown/hashing and spend a lot more time further from the
>> processor's op-codes.
>
> With this specific test (half good half bad), on Py2.5, on my PC, sets
> start to be faster than the string search when the string "good" is
> about 5-6 chars long (this means set are quite fast, I presume).
>
> from random import choice, seed
> from time import clock
>
> def main(choice=choice):
> seed(1)
> n = 100000
>
> for good in ("ab", "abc", "abcdef", "abcdefgh",
> "abcdefghijklmnopqrstuvwxyz"):
> poss = good + good.upper()
> data = [choice(poss) for _ in xrange(n)] * 10
> print "len(good) = ", len(good)
>
> t = clock()
> for c in data:
> c in good
> print round(clock()-t, 2)
>
> t = clock()
> sgood = set(good)
> for c in data:
> c in sgood
> print round(clock()-t, 2), "\n"
>
> main()
On my Python2.4 for Windows, they are often still neck-and-neck
for len(good) = 26. set's disadvantage of having to be
constructed is heavily amortized over 100,000 membership
tests. Without knowing the usage pattern, it'd be hard to choose
between them.
--
Neil Cerutti
More information about the Python-list
mailing list