[Python-ideas] set.add(x) slower than if x in set:set.add(x)

Terry Reedy tjreedy at udel.edu
Mon Sep 14 08:53:43 CEST 2009


Yuvgoog Greenle wrote:
> So this pattern is a valid python optimization?

That exactly not how I interpreted Raymond

> 
> On Sun, Sep 13, 2009 at 10:37 PM, Raymond Hettinger 
> <python at rcn.com 
> <mailto:python at rcn.com>> wrote:
> 
> 
>     On Sep 13, 2009, at 11:10 AM, Gerald Britton wrote:
> 
>          Here are some results:
> 
>         $ python -m timeit -n 1000000  -s 'with
>         open("/usr/share/dict/words")
>         as f: s = set(w.strip("\n") for w in f)'  's.add("mother")'
>         1000000 loops, best of 3: 0.292 usec per loop

Try looking up and binding the method just once, any any experienced 
Python programmer might do if doing repeated 'additions'.

-s'...: sadd=set(w.strip("\n") for w in f).add' 'sadd("mother")

>         britton at TheBrittons:~$ python -m timeit -n 1000000  -s 'with
>         open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w
>         in f)'
>         'if "mother" not in s:s.add("mother")'
>         1000000 loops, best of 3: 0.185 usec per loop

Add 'sadd = s.add' at end of setup, followed by
'if "mother" not in s: sadd("mother")

I doubt second will still be faster.

>         the second example beats the first by about 36%
> 
>         Is the timing difference just the cost of the method lookup for
>         s.add,
>         or is something else happening that I'm not seeing?
> 
> 
>     It is the something else you're not seeing ;-)
> 
>     On the first pass of the 1000000 loops, "mother" gets added.
>     On the remaining passes the 'if "mother" not in set' test fails
>     and the set.add() never gets executed.  That latter operation
>     is a bit more expensive than the contains-test because it
>     includes the time to lookup and bind the add method.

My suggested alteration removes the repeated lookup and bind.

tjr




More information about the Python-ideas mailing list