Performance of list vs. set equality operations

Lie Ryan lie.1296 at gmail.com
Wed Apr 7 01:32:22 EDT 2010


On 04/07/10 04:11, Gustavo Narea wrote:
> Hello!
> 
> Could you please confirm whether my understanding of equality
> operations in sets and lists is correct? This is how I think things
> work, partially based on experimentation and the online documentation
> for Python:
> 
> When you compare two lists, *every* element of one of the lists is
> compared against the element at the same position in the other list;
> that comparison is done by the __eq__() method (or the equivalent for
> builtin types). This is interrupted when a result is False.
> 
> When you compare two sets, there's a loop over all the elements of the
> first set, where the hash for that element is looked up in the second
> set:
>  - If this hash matches the hash for one or more elements in the
> second set, the element in the first set is compared (with __eq__ or
> equivalent) against the elements in the second set which have the same
> hash. When a result is True, nothing else is done on that element and
> the loop takes the next element in the first set; when all the results
> are False, the loop ends and the two sets are not equivalent.
>  - If the hash doesn't match that of an element in the second set,
> then the loop ends and the two sets are not equivalent.

I have not seen python's set implementation, but if you keep a bitmap of
hashes that already exist in a set, you can compare 32 or 64 items (i.e.
the computer's native word-size) at a time instead of comparing items
one-by-one[1]; this could marginally improve set operation's performance
for doing comparisons, difference, update, etc. Anyone that have seen
python's set source code can confirm whether such thing are implemented
in python's set?

[1] the possibility of hash collision complicates this a little bit, I
haven't fully thought out the the consequences of the interaction of
such bitmap with hash collision handling (there was a PyCon 2010 talk
"The Mighty Dictionary" by Brandon Craig Rhodes describing collision
handling
http://python.mirocommunity.org/video/1591/pycon-2010-the-mighty-dictiona).



More information about the Python-list mailing list