[Python-Dev] Retrieve an arbitrary element from a set without removing it

John Arbash Meinel john.arbash.meinel at gmail.com
Fri Oct 23 19:25:48 CEST 2009


Vitor Bosshard wrote:
> 2009/10/23 Willi Richert <w.richert at gmx.net>:
>> Hi,
>>
>> recently I wrote an algorithm, in which very often I had to get an arbitrary
>> element from a set without removing it.
>>
>> Three possibilities came to mind:
>>
>> 1.
>> x = some_set.pop()
>> some_set.add(x)
>>
>> 2.
>> for x in some_set:
>>        break
>>
>> 3.
>> x = iter(some_set).next()
>>
>>
>> Of course, the third should be the fastest. It nevertheless goes through all
>> the iterator creation stuff, which costs some time. I wondered, why the builtin
>> set does not provide a more direct and efficient way for retrieving some element
>> without removing it. Is there any reason for this?
>>
>> I imagine something like
>>
>> x = some_set.get()
>>
> 
> 
> I see this as being useful for frozensets as well, where you can't get
> an arbitrary element easily due to the obvious lack of .pop(). I ran
> into this recently, when I had a frozenset that I knew had 1 element
> (it was the difference between 2 other sets), but couldn't get to that
> element easily (get the pun?)

So in my testing (2) was actually the fastest. I assumed because .next()
was a function call overhead, while:
for x in some_set:
  break

Was evaluated inline. It probably still has to call PyObject_GetIter,
however it doesn't have to create a stack frame for it.

This is what "timeit" tells me. All runs are of the form:
python -m timeit -s "s = set([10])" ...

0.101us	"for x in s: break; x"
0.130us "for x in s: pass; x"
0.234us -s "n = next; i = iter" "x = n(i(s)); x"
0.248us "x = next(iter(s)); x"
0.341us "x = iter(s).next(); x"

So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x
faster than (iter(s).next()).
I was pretty surprised that it was 30% faster than "for x in s: pass". I
assume it has something to do with a potential "else:" statement?

Note that all of these are significantly < 1us. So this only matters if
it is something you are doing often.

I don't know your specific timings, but I would guess that:
  for x in s: break

Is actually going to be faster than your
  s.get()

Primarily because s.get() requires an attribute lookup. I would think
your version might be faster for:
  stat2 = "g = s.get; for i in xrange(100): g()"

However, that is still a function call, which may be treated differently
by the interpreter than the for:break loop. I certainly suggest you try
it and compare.

John
=:->


More information about the Python-Dev mailing list