[Python-ideas] Changing `Sequence.__contains__`
Andrew Barnert
abarnert at yahoo.com
Mon Jul 21 04:18:44 CEST 2014
On Sunday, July 20, 2014 7:06 PM, Mark Lawrence <breamoreboy at yahoo.co.uk> wrote:
> > On 20/07/2014 23:06, Ram Rachum wrote:
>> Why does the default `Sequence.__contains__` iterate through the items
>> rather than use `.index`, which may sometimes be more efficient?
>>
>> I suggest an implementation like this:
>>
>> def __contains__(self, i):
>> try: self.index(i)
>> except ValueError: return False
>> else: return True
>> What do you think?
>>
>
> I don't see how that can be more efficient than the naive
>
> def __contains__(self, i):
> for elem in self:
> if elem == i:
> return True
> return False
>
> What am I missing?
Consider a blist.sortedlist (http://stutzbachenterprises.com/blist/sortedlist.html), or any other such data structure built on a tree, skip list, etc.
The index method is O(log N), so Ram's __contains__ is also O(log N). But naively iterating is obviously O(N). (In fact, it could be worse—if you don't implement a custom __iter__, and your indexing is O(log N), then the naive __contains__ will be O(N log N)…)
Needless to say, blist.sortedlist implements a custom O(log N) __contains__, and so does (hopefully) every other such library on PyPI. But Ram's proposal would mean they no longer have to do so; they'll get O(log N) __contains__ for free just by implementing index.
Of course that only removes one method. For example, they still have to implement a custom count method or they'll get O(N) performance from the default version. If you look at the code for any of these types, __contains__ is a tiny percentage of the implementation. So, it's not a huge win. But it's a small one.
More information about the Python-ideas
mailing list