Performance of list vs. set equality operations

Chris Colbert sccolbert at gmail.com
Tue Apr 6 14:28:10 EDT 2010


the proof is in the pudding:

In [1]: a = range(10000)

In [2]: s = set(a)

In [3]: s2 = set(a)

In [5]: b = range(10000)

In [6]: a == b
Out[6]: True

In [7]: s == s2
Out[7]: True

In [8]: %timeit a == b
1000 loops, best of 3: 204 us per loop

In [9]: %timeit s == s2
10000 loops, best of 3: 124 us per loop


On Tue, Apr 6, 2010 at 2:11 PM, Gustavo Narea <me at gustavonarea.net> wrote:
> Hello!
>
> Could you please confirm whether my understanding of equality
> operations in sets and lists is correct? This is how I think things
> work, partially based on experimentation and the online documentation
> for Python:
>
> When you compare two lists, *every* element of one of the lists is
> compared against the element at the same position in the other list;
> that comparison is done by the __eq__() method (or the equivalent for
> builtin types). This is interrupted when a result is False.
>
> When you compare two sets, there's a loop over all the elements of the
> first set, where the hash for that element is looked up in the second
> set:
>  - If this hash matches the hash for one or more elements in the
> second set, the element in the first set is compared (with __eq__ or
> equivalent) against the elements in the second set which have the same
> hash. When a result is True, nothing else is done on that element and
> the loop takes the next element in the first set; when all the results
> are False, the loop ends and the two sets are not equivalent.
>  - If the hash doesn't match that of an element in the second set,
> then the loop ends and the two sets are not equivalent.
>
> So this means that:
>  1.- When you have two collections which have the same elements, the
> equality operation will *always* be faster with lists.
>  2.- When you have two collections with different elements, the
> equality operation *may* be faster with sets.
>
> For example, if you have two collections of 1,000 elements each and
> 998 of them are equivalent, comparing both collections as sets will be
> slower than doing it with lists. But if you have two collections of
> 1,000 elements each and 998 of them are not equivalent, then comparing
> both collections as lists will be slower than doing it with sets.
>
> The performance of equality operations on sets is directly
> proportional to the amount of different elements in both sets, while
> the performance of equality operations on lists is simply proportional
> to the cardinality of the collection.
>
> In other words: The more different elements two collections have, the
> faster it is to compare them as sets. And as a consequence, the more
> equivalent elements two collections have, the faster it is to compare
> them as lists.
>
> Is this correct?
>
> This is why so many people advocate the use of sets instead of lists/
> tuples in similar situations, right?
>
> Cheers,
>
>  - Gustavo.
> --
> http://mail.python.org/mailman/listinfo/python-list
>



More information about the Python-list mailing list