[Python-ideas] set.add() return value

Brandon Mintern bmintern at gmail.com
Tue Feb 17 09:37:06 CET 2009


On Tue, Feb 17, 2009 at 2:36 AM, Steven D'Aprano <steve at pearwood.info> wrote:
> if x in set:
>    print "already there"
> else:
>    set.add(x)
>
> versus:
>
> if set.add(x):  # or another method name if you prefer
>    print "already there"

These are truly different in terms of runtime performance, though. In
the first example, let's examine both cases. If x isn't in the set, we
hash x, perform a worst-case set lookup (because we have to scan the
entire list for whatever bucket that x hashes to), hash x again, and
then perform that same lookup again before adding x to the set. If x
is in the set, we perform one hash and one lookup.

In the second example, we always hash and lookup x exactly once,
adding it if necessary. This could turn out to be a non-trivial
difference in runtime if the set is large enough or x is an object
with a non-trivial hash. Sure the difference is a constant factor, but
it's a significant one, somewhere between 1.25-2 depending on the
percentage of "add"s that are unique items (if every item is unique,
we're looking at a 2x speedup for set operations).

But maybe that's just my premature optimization coming through,
Brandon



More information about the Python-ideas mailing list