Mailman 3 set.add(x) slower than if x in set:set.add(x) - Python-ideas

newer
Add a builtin method to 'int' for...

set.add(x) slower than if x in set:set.add(x)

older
Re: [Python-ideas] IDEA: do not...

Gerald Britton

Sept. 14, 2009

1:10 a.m.

Hi -- This is maybe the wrong list for this question. If would someone please redirect me? I stumbled across a performance anomaly wrt the set.add method. My idea was that if I try to add something via set.add, the method has to first check if the new item is already in the set, since set items are supposed to be unique. Then, on a whim, I stuck an "if x in set" condition in front of it. I was surprised to learn that this latter approach runs faster! Here are some results: $ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 's.add("mother")' 1000000 loops, best of 3: 0.292 usec per loop britton@TheBrittons:~$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 'if "mother" not in s:s.add("mother")' 1000000 loops, best of 3: 0.185 usec per loop the second example beats the first by about 36% Is the timing difference just the cost of the method lookup for s.add, or is something else happening that I'm not seeing? -- Gerald Britton

Show replies by date

Raymond Hettinger

September 2009

2:37 a.m.

On Sep 13, 2009, at 11:10 AM, Gerald Britton wrote:

...

Here are some results:

$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 's.add("mother")' 1000000 loops, best of 3: 0.292 usec per loop

britton@TheBrittons:~$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 'if "mother" not in s:s.add("mother")' 1000000 loops, best of 3: 0.185 usec per loop

the second example beats the first by about 36%

Is the timing difference just the cost of the method lookup for s.add, or is something else happening that I'm not seeing?

It is the something else you're not seeing ;-) On the first pass of the 1000000 loops, "mother" gets added. On the remaining passes the 'if "mother" not in set' test fails and the set.add() never gets executed. That latter operation is a bit more expensive than the contains-test because it includes the time to lookup and bind the add method. Raymond

Yuvgoog Greenle

9:56 a.m.

So this pattern is a valid python optimization? Funky... Sadly, there's no way around it unless the interpreter somehow did it magically for you. On Sun, Sep 13, 2009 at 10:37 PM, Raymond Hettinger <python@rcn.com> wrote:

...

On Sep 13, 2009, at 11:10 AM, Gerald Britton wrote:

...
Here are some results:

$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 's.add("mother")' 1000000 loops, best of 3: 0.292 usec per loop

britton@TheBrittons:~$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 'if "mother" not in s:s.add("mother")' 1000000 loops, best of 3: 0.185 usec per loop

the second example beats the first by about 36%

Is the timing difference just the cost of the method lookup for s.add, or is something else happening that I'm not seeing?

It is the something else you're not seeing ;-)

On the first pass of the 1000000 loops, "mother" gets added. On the remaining passes the 'if "mother" not in set' test fails and the set.add() never gets executed. That latter operation is a bit more expensive than the contains-test because it includes the time to lookup and bind the add method.

Raymond

_______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

Terry Reedy

1:53 p.m.

Yuvgoog Greenle wrote:

...

So this pattern is a valid python optimization?

That exactly not how I interpreted Raymond

...

On Sun, Sep 13, 2009 at 10:37 PM, Raymond Hettinger <python@rcn.com <mailto:python@rcn.com>> wrote:

On Sep 13, 2009, at 11:10 AM, Gerald Britton wrote:

Here are some results:

$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 's.add("mother")' 1000000 loops, best of 3: 0.292 usec per loop

Try looking up and binding the method just once, any any experienced Python programmer might do if doing repeated 'additions'. -s'...: sadd=set(w.strip("\n") for w in f).add' 'sadd("mother")

...

britton@TheBrittons:~$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 'if "mother" not in s:s.add("mother")' 1000000 loops, best of 3: 0.185 usec per loop

Add 'sadd = s.add' at end of setup, followed by 'if "mother" not in s: sadd("mother") I doubt second will still be faster.

...

the second example beats the first by about 36%

Is the timing difference just the cost of the method lookup for s.add, or is something else happening that I'm not seeing?

It is the something else you're not seeing ;-)

On the first pass of the 1000000 loops, "mother" gets added. On the remaining passes the 'if "mother" not in set' test fails and the set.add() never gets executed. That latter operation is a bit more expensive than the contains-test because it includes the time to lookup and bind the add method.

My suggested alteration removes the repeated lookup and bind. tjr

Georg Brandl

4:23 p.m.

Terry Reedy schrieb:

...

...
britton@TheBrittons:~$ python -m timeit -n 1000000 -s 'with open("/usr/share/dict/words") as f: s = set(w.strip("\n") for w in f)' 'if "mother" not in s:s.add("mother")' 1000000 loops, best of 3: 0.185 usec per loop

Add 'sadd = s.add' at end of setup, followed by 'if "mother" not in s: sadd("mother")

I doubt second will still be faster.

Well, the method also has to be *called* (think argument parsing, but in this case that should be trivial). Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Nick Coghlan

8:20 p.m.

Yuvgoog Greenle wrote:

...

So this pattern is a valid python optimization? Funky...

Sadly, there's no way around it unless the interpreter somehow did it magically for you.

The interpreter has no way of knowing a priori that the branch won't be taken 999,999 times out of 1,000,000. Think about what you are actually comparing here: Average speed of 1 million calls to s.add(x) Average speed of 1 million calls to "x in s" + one call to s.add(x) Now think about the fact that s.add(x) includes a containment test plus function call and name lookup overhead even in the case that the item is already present in the set. All overhead included: $ python -m timeit -s "s = set()" "s.add(1)" 1000000 loops, best of 3: 0.197 usec per loop Lose the attribute lookup: $ python -m timeit -s "s = set()" -s "sadd = s.add" "sadd(1)" 10000000 loops, best of 3: 0.146 usec per loop Skip the function call altogether most of the time: $ python -m timeit -s "s = set()" "if 1 not in s: s.add(1)" 10000000 loops, best of 3: 0.101 usec per loop Just do the containment test: $ python -m timeit -s "s = set([1])" "1 not in s" 10000000 loops, best of 3: 0.1 usec per loop Now then, lets also look at the absolute numbers we're discussing here (on my machine, anyway). Is the fastest version twice as fast as the slowest version? Yes it is. But that difference is only 97 *nano*seconds. And relative to the recommended approach of caching the attribute looking, we're only saving 45 nanoseconds. And in more realistic use cases where some items are already in the set and some aren't, the "I'll check first" implementation can become a pessimisation instead of an optimisation. To emphasise the point, we'll go to the other extreme where the item is added to the set every time: All the overhead: $ python -m timeit -s "s = set()" "s.add(1)" "s.clear()" 1000000 loops, best of 3: 0.374 usec per loop The "optimised" approach: $ python -m timeit -s "s = set()" "if 1 not in s: s.add(1)" "s.clear()" 1000000 loops, best of 3: 0.444 usec per loop Oops, looks like the approach that saves us 45 nanoseconds when the item is already in the set may cost us up to *70* nanoseconds when it turns out we need to add the item after all. Caching attribute lookups before time critical loops is a good optimisation technique that most experienced Python programmers learn. Pre-checking a condition that a called function is just going to check again and bail out quickly in less than 50% of cases? Usually a bad idea - the extra checks made when the function is invoked anyway will often cancel out any gains you make from avoiding the function call overhead. That said, as with any micro-optimisation: time it on a range of typical and end-case data sets. If you avoid the function call overhead often enough, doing a pre-check may be a net win for some inner loops. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Gerald Britton

9:19 p.m.

Interesting discussion here! I did a little analysis: let a = cost of "item in set" let b = cost of set.add let c = cost of sadd = set.add if I do x+y additions to my set where x is number of items already in the set and y is the number not in the set, using my original approach I get a cost of: a.(x+y) + b.y # we always test for set membership, but only add new items for the "if item not in set: set.add(item)" approach, or a.(x+y) + c.y, if I substitute sadd for set.add If I always add the item to the set without checking first, the cost is b.(x+y) (or c.(x+y) using sadd) assuming that set.add() takes close to the same time whether the item is in the set or not (in reality I suppose that it needs to play with a few pointers if the item is not in the set and may need to rebalance the tree, if it is a red-black tree or something similar -- what is it, actually?) That means that break-even should happen when: a.(x+y) +b.y = b.(x+y) ax + ay + by = bx + by ay = (b-a)x y = ((b-a)/a)x plugging in the numbers from earlier in this thread, we have: a = .276 uS b = .489 uS c = .298uS (b-a)/a = .772 (c-a)/a = .08 so the approach "if item not in set: set.add(item)" wins when the number of items not added to the set (thus failing the "if" statement)

...

= 23% of those that are added. Setting sadd = set.add beforehand increases that break-even point to about 92% (almost no advantage).

Lesson learned: If you know (or can reasonably test) your sample data, you can choose the better method. If your data is unknown (random from your viewpoint), you could assume that about half the items to be added are already in the set, which is < 92%, so there is no point in doing the "if item in set" test beforehand. YMMV On Mon, Sep 14, 2009 at 9:20 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Yuvgoog Greenle wrote:

...
So this pattern is a valid python optimization? Funky...

Sadly, there's no way around it unless the interpreter somehow did it magically for you.

The interpreter has no way of knowing a priori that the branch won't be taken 999,999 times out of 1,000,000.

Think about what you are actually comparing here:

Average speed of 1 million calls to s.add(x)

Average speed of 1 million calls to "x in s" + one call to s.add(x)

Now think about the fact that s.add(x) includes a containment test plus function call and name lookup overhead even in the case that the item is already present in the set.

All overhead included: $ python -m timeit -s "s = set()" "s.add(1)" 1000000 loops, best of 3: 0.197 usec per loop

Lose the attribute lookup: $ python -m timeit -s "s = set()" -s "sadd = s.add" "sadd(1)" 10000000 loops, best of 3: 0.146 usec per loop

Skip the function call altogether most of the time: $ python -m timeit -s "s = set()" "if 1 not in s: s.add(1)" 10000000 loops, best of 3: 0.101 usec per loop

Just do the containment test: $ python -m timeit -s "s = set([1])" "1 not in s" 10000000 loops, best of 3: 0.1 usec per loop

Now then, lets also look at the absolute numbers we're discussing here (on my machine, anyway). Is the fastest version twice as fast as the slowest version? Yes it is. But that difference is only 97 *nano*seconds. And relative to the recommended approach of caching the attribute looking, we're only saving 45 nanoseconds.

And in more realistic use cases where some items are already in the set and some aren't, the "I'll check first" implementation can become a pessimisation instead of an optimisation. To emphasise the point, we'll go to the other extreme where the item is added to the set every time:

All the overhead: $ python -m timeit -s "s = set()" "s.add(1)" "s.clear()" 1000000 loops, best of 3: 0.374 usec per loop

The "optimised" approach: $ python -m timeit -s "s = set()" "if 1 not in s: s.add(1)" "s.clear()" 1000000 loops, best of 3: 0.444 usec per loop

Oops, looks like the approach that saves us 45 nanoseconds when the item is already in the set may cost us up to *70* nanoseconds when it turns out we need to add the item after all.

Caching attribute lookups before time critical loops is a good optimisation technique that most experienced Python programmers learn. Pre-checking a condition that a called function is just going to check again and bail out quickly in less than 50% of cases? Usually a bad idea - the extra checks made when the function is invoked anyway will often cancel out any gains you make from avoiding the function call overhead.

That said, as with any micro-optimisation: time it on a range of typical and end-case data sets. If you avoid the function call overhead often enough, doing a pre-check may be a net win for some inner loops.

Cheers, Nick.

-- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia --------------------------------------------------------------- _______________________________________________ Python-ideas mailing list Python-ideas@python.org http://mail.python.org/mailman/listinfo/python-ideas

-- Gerald Britton

Daniel Stutzbach

9:26 p.m.

On Mon, Sep 14, 2009 at 9:19 AM, Gerald Britton <gerald.britton@gmail.com>wrote:

...

assuming that set.add() takes close to the same time whether the item is in the set or not (in reality I suppose that it needs to play with a few pointers if the item is not in the set and may need to rebalance the tree, if it is a red-black tree or something similar -- what is it, actually?)

Under the hood, the set type uses a hash table. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Gerald Britton

10:03 p.m.

Aha ok, so cheap inserts, not so cheap lookups (especially if not found) On Mon, Sep 14, 2009 at 10:26 AM, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:

...

On Mon, Sep 14, 2009 at 9:19 AM, Gerald Britton <gerald.britton@gmail.com> wrote:

...
assuming that set.add() takes close to the same time whether the item is in the set or not (in reality I suppose that it needs to play with a few pointers if the item is not in the set and may need to rebalance the tree, if it is a red-black tree or something similar -- what is it, actually?)

Under the hood, the set type uses a hash table.

-- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC

-- Gerald Britton

Daniel Stutzbach

10:13 p.m.

On Mon, Sep 14, 2009 at 10:03 AM, Gerald Britton <gerald.britton@gmail.com>wrote:

...

Aha ok, so cheap inserts, not so cheap lookups (especially if not found)

Eh? The average case lookup cost for a hash table is O(1), better than most tree structures which typically have a cost of O(log n). -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Nick Coghlan

3:44 a.m.

Gerald Britton wrote:

...

Aha ok, so cheap inserts, not so cheap lookups (especially if not found)

Other way around - lookup is cheap (unless you have an expensive hash calculation or poor hash diversity), addition can be more expensive (although the set C implementation is optimised pretty well). Note that this does mean the relative benefits of the LBYL (Look-Before-You-Leap) approach and the EAFP (Easier-to-Ask-Forgiveness-than-Permission) approach vary based on the data types involved. I would get different performance numbers using strings or floats or Decimal objects rather than ints because the relative cost of the hash() calculation would change. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Marcin 'Qrczak' Kowalczyk

12:58 a.m.

2009/9/14 Nick Coghlan <ncoghlan@gmail.com>:

...

The interpreter has no way of knowing a priori that the branch won't be taken 999,999 times out of 1,000,000.

And it does not know a priori that s.add(x) will involve checking whether the set s contains x. It does not even try to infer that s must be a set at this point. -- Marcin Kowalczyk qrczak@knm.org.pl http://qrnik.knm.org.pl/~qrczak/

5642

Age (days ago)

5644

Last active (days ago)

List overview

Download

11 comments

8 participants

participants (8)

Daniel Stutzbach
Georg Brandl
Gerald Britton
Marcin 'Qrczak' Kowalczyk
Nick Coghlan
Raymond Hettinger
Terry Reedy
Yuvgoog Greenle

set.add(x) slower than if x in set:set.add(x)

Daniel Stutzbach

Daniel Stutzbach

Marcin 'Qrczak' Kowalczyk

tags

participants (8)