making a valid file name...
Tim Chase
python.list at tim.thechases.com
Tue Oct 17 13:50:58 EDT 2006
>> If you're doing it on a time-critical basis, it might help to
>> make "valid" a set, which should have O(1) membership testing,
>> rather than using the "in" test with a string. I don't know
>> how well the find() method of a string performs in relationship
>> to "in" testing of a set. Test and see, if it's important.
>
> The find method of (8-bit) strings is really, really fast. My
> guess is that set can't beat it. I tried to beat it recently with
> a binary search function. Even after applying psyco find was
> still faster (though I could beat the bisect functions by a
> little bit by replacing a divide with a shift).
In "theory" (you know...that little town in west Texas where
everything goes right), a set-membership test should be O(1). A
binary search function would be O(log N). A linear search of a
string for a member should be O(N).
In practice, however, for such small strings as the given
whitelist, the underlying find() operation likely doesn't put a
blip on the radar. If your whitelist were some huge document
that you were searching repeatedly, it could have worse
performance. Additionally, the find() in the underlying C code
is likely about as bare-metal as it gets, whereas the set
membership aspect of things may go through some more convoluted
setup/teardown/hashing and spend a lot more time further from the
processor's op-codes.
And I know that a number of folks have done some hefty
optimization of Python's string-handling abilities. There's
likely a tradeoff point where it's better to use one over the
other depending on the size of the whitelist. YMMV
-tkc
More information about the Python-list
mailing list