Performance of list vs. set equality operations

Sat Apr 10 08:32:30 EDT 2010

Steven D'Aprano, 08.04.2010 03:41:
> On Wed, 07 Apr 2010 10:55:10 -0700, Raymond Hettinger wrote:
>
>> [Gustavo Nare]
>>> In other words: The more different elements two collections have, the
>>> faster it is to compare them as sets. And as a consequence, the more
>>> equivalent elements two collections have, the faster it is to compare
>>> them as lists.
>>>
>>> Is this correct?
>>
>> If two collections are equal, then comparing them as a set is always
>> slower than comparing them as a list.  Both have to call __eq__ for
>> every element, but sets have to search for each element while lists can
>> just iterate over consecutive pointers.
>>
>> If the two collections have unequal sizes, then both ways immediately
>> return unequal.
>
>
> Perhaps I'm misinterpreting what you are saying, but I can't confirm that
> behaviour, at least not for subclasses of list:
>
> >>> class MyList(list):
> ...     def __len__(self):
> ...             return self.n
> ...
> >>> L1 = MyList(range(10))
> >>> L2 = MyList(range(10))
> >>> L1.n = 9
> >>> L2.n = 10
> >>> L1 == L2
> True
> >>> len(L1) == len(L2)
> False

This code incorrectly assumes that overriding __len__ has an impact on the 
equality of two lists. If you want to influence the equality, you need to 
override __eq__. If you don't, the original implementation is free to do 
whatever it likes to determine if it is equal to another value or not. If 
it uses __len__ for that or not is only an implementation detail that can't 
be relied upon.

Stefan