Re: [Python-Dev] Retrieve an arbitrary element from a set without removing it

Oct. 24, 2009

      Hi,

someone on this list mentioned that much of the s.get() time is spend on the 
name lookup for get(). That is indeed the case:

===================
from timeit import *

stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          ",
         "g=s.get;\nfor i in xrange(1000): g()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(1000))")
    print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
==================

Time for for i in xrange(1000): iter(s).next()   :       0.448227
Time for for i in xrange(1000):
        for x in s:
                break:   0.141669
Time for for i in xrange(1000): s.add(s.pop())   :       0.348055
Time for for i in xrange(1000): s.get()          :       0.148580
Time for g=s.get;
for i in xrange(1000): g()          :    0.080563

So, now set.get() is indeed the fastest and preferable solution if you need 
massive amounts of retrieving elements from a set without removing them.

wr

Am Freitag, 23. Oktober 2009 22:53:24 schrieb Willi Richert:
...
Hi,
surprised about the performance of for/break provided by Vitor, I did some
more testing. It revealed that indeed  we can forget the get() (which was
implemented as a stripped down pop()):
from timeit import *
stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak            ",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]
for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()
$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break                            :       0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673
In some tests, for/break was even slightly faster then get().
As always, intuition regarding performance bottlenecks is flawed ;-)
Anyway, thanks for all the helpful comments, especially to Stefan for the
http://comments.gmane.org/gmane.comp.python.ideas/5606 link.
Regards,
wr
Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:
...
Vitor Bosshard wrote:
...
2009/10/23 Willi Richert <w.richert@gmx.net>:
...
Hi,
recently I wrote an algorithm, in which very often I had to get an
arbitrary element from a set without removing it.
Three possibilities came to mind:
1.
x = some_set.pop()
some_set.add(x)
2.
for x in some_set:
       break
3.
x = iter(some_set).next()
Of course, the third should be the fastest. It nevertheless goes
through all the iterator creation stuff, which costs some time. I
wondered, why the builtin set does not provide a more direct and
efficient way for retrieving some element without removing it. Is
there any reason for this?
I imagine something like
x = some_set.get()
I see this as being useful for frozensets as well, where you can't get
an arbitrary element easily due to the obvious lack of .pop(). I ran
into this recently, when I had a frozenset that I knew had 1 element
(it was the difference between 2 other sets), but couldn't get to that
element easily (get the pun?)
So in my testing (2) was actually the fastest. I assumed because .next()
was a function call overhead, while:
for x in some_set:
  break
Was evaluated inline. It probably still has to call PyObject_GetIter,
however it doesn't have to create a stack frame for it.
This is what "timeit" tells me. All runs are of the form:
python -m timeit -s "s = set([10])" ...
0.101us	"for x in s: break; x"
0.130us "for x in s: pass; x"
0.234us -s "n = next; i = iter" "x = n(i(s)); x"
0.248us "x = next(iter(s)); x"
0.341us "x = iter(s).next(); x"
So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
faster than (iter(s).next()).
I was pretty surprised that it was 30% faster than "for x in s: pass". I
assume it has something to do with a potential "else:" statement?
Note that all of these are significantly < 1us. So this only matters if
it is something you are doing often.
I don't know your specific timings, but I would guess that:
  for x in s: break
Is actually going to be faster than your
  s.get()
Primarily because s.get() requires an attribute lookup. I would think
your version might be faster for:
  stat2 = "g = s.get; for i in xrange(100): g()"
However, that is still a function call, which may be treated differently
by the interpreter than the for:break loop. I certainly suggest you try
it and compare.
John
=:->