Performance of list vs. set equality operations
sccolbert at gmail.com
Tue Apr 6 20:28:10 CEST 2010
the proof is in the pudding:
In : a = range(10000)
In : s = set(a)
In : s2 = set(a)
In : b = range(10000)
In : a == b
In : s == s2
In : %timeit a == b
1000 loops, best of 3: 204 us per loop
In : %timeit s == s2
10000 loops, best of 3: 124 us per loop
On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea <me at gustavonarea.net> wrote:
> Could you please confirm whether my understanding of equality
> operations in sets and lists is correct? This is how I think things
> work, partially based on experimentation and the online documentation
> for Python:
> When you compare two lists, *every* element of one of the lists is
> compared against the element at the same position in the other list;
> that comparison is done by the __eq__() method (or the equivalent for
> builtin types). This is interrupted when a result is False.
> When you compare two sets, there's a loop over all the elements of the
> first set, where the hash for that element is looked up in the second
> - If this hash matches the hash for one or more elements in the
> second set, the element in the first set is compared (with __eq__ or
> equivalent) against the elements in the second set which have the same
> hash. When a result is True, nothing else is done on that element and
> the loop takes the next element in the first set; when all the results
> are False, the loop ends and the two sets are not equivalent.
> - If the hash doesn't match that of an element in the second set,
> then the loop ends and the two sets are not equivalent.
> So this means that:
> 1.- When you have two collections which have the same elements, the
> equality operation will *always* be faster with lists.
> 2.- When you have two collections with different elements, the
> equality operation *may* be faster with sets.
> For example, if you have two collections of 1,000 elements each and
> 998 of them are equivalent, comparing both collections as sets will be
> slower than doing it with lists. But if you have two collections of
> 1,000 elements each and 998 of them are not equivalent, then comparing
> both collections as lists will be slower than doing it with sets.
> The performance of equality operations on sets is directly
> proportional to the amount of different elements in both sets, while
> the performance of equality operations on lists is simply proportional
> to the cardinality of the collection.
> In other words: The more different elements two collections have, the
> faster it is to compare them as sets. And as a consequence, the more
> equivalent elements two collections have, the faster it is to compare
> them as lists.
> Is this correct?
> This is why so many people advocate the use of sets instead of lists/
> tuples in similar situations, right?
> - Gustavo.
More information about the Python-list