[Python-Dev] Retrieve an arbitrary element from a set without removing it

Fri Oct 23 22:53:24 CEST 2009

Hi,

surprised about the performance of for/break provided by Vitor, I did some 
more testing. It revealed that indeed  we can forget the get() (which was 
implemented as a stripped down pop()):

from timeit import *
stats = ["for i in xrange(1000): iter(s).next()   ",
         "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak            ",
         "for i in xrange(1000): s.add(s.pop())   ",
         "for i in xrange(1000): s.get()          "]

for stat in stats:
    t = Timer(stat, setup="s=set(range(100))")
    try:
        print "Time for %s:\t %f"%(stat, t.timeit(number=1000))
    except:
        t.print_exc()

$ ./test_get.py
Time for for i in xrange(1000): iter(s).next()   :       0.433080
Time for for i in xrange(1000):
        for x in s:
                break                            :       0.148695
Time for for i in xrange(1000): s.add(s.pop())   :       0.317418
Time for for i in xrange(1000): s.get()          :       0.146673

In some tests, for/break was even slightly faster then get().

As always, intuition regarding performance bottlenecks is flawed ;-)

Anyway, thanks for all the helpful comments, especially to Stefan for the 
http://comments.gmane.org/gmane.comp.python.ideas/5606 link.

Regards,
wr

Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:
> Vitor Bosshard wrote:
> > 2009/10/23 Willi Richert <w.richert at gmx.net>:
> >> Hi,
> >>
> >> recently I wrote an algorithm, in which very often I had to get an
> >> arbitrary element from a set without removing it.
> >>
> >> Three possibilities came to mind:
> >>
> >> 1.
> >> x = some_set.pop()
> >> some_set.add(x)
> >>
> >> 2.
> >> for x in some_set:
> >>        break
> >>
> >> 3.
> >> x = iter(some_set).next()
> >>
> >>
> >> Of course, the third should be the fastest. It nevertheless goes through
> >> all the iterator creation stuff, which costs some time. I wondered, why
> >> the builtin set does not provide a more direct and efficient way for
> >> retrieving some element without removing it. Is there any reason for
> >> this?
> >>
> >> I imagine something like
> >>
> >> x = some_set.get()
> >
> > I see this as being useful for frozensets as well, where you can't get
> > an arbitrary element easily due to the obvious lack of .pop(). I ran
> > into this recently, when I had a frozenset that I knew had 1 element
> > (it was the difference between 2 other sets), but couldn't get to that
> > element easily (get the pun?)
> 
> So in my testing (2) was actually the fastest. I assumed because .next()
> was a function call overhead, while:
> for x in some_set:
>   break
> 
> Was evaluated inline. It probably still has to call PyObject_GetIter,
> however it doesn't have to create a stack frame for it.
> 
> This is what "timeit" tells me. All runs are of the form:
> python -m timeit -s "s = set([10])" ...
> 
> 0.101us	"for x in s: break; x"
> 0.130us "for x in s: pass; x"
> 0.234us -s "n = next; i = iter" "x = n(i(s)); x"
> 0.248us "x = next(iter(s)); x"
> 0.341us "x = iter(s).next(); x"
> 
> So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
> faster than (iter(s).next()).
> I was pretty surprised that it was 30% faster than "for x in s: pass". I
> assume it has something to do with a potential "else:" statement?
> 
> Note that all of these are significantly < 1us. So this only matters if
> it is something you are doing often.
> 
> I don't know your specific timings, but I would guess that:
>   for x in s: break
> 
> Is actually going to be faster than your
>   s.get()
> 
> Primarily because s.get() requires an attribute lookup. I would think
> your version might be faster for:
>   stat2 = "g = s.get; for i in xrange(100): g()"
> 
> However, that is still a function call, which may be treated differently
> by the interpreter than the for:break loop. I certainly suggest you try
> it and compare.
> 
> John
> =:->
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-dev/attachments/20091023/755690d7/attachment-0001.htm>