[Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it

Fri Oct 30 16:11:32 CET 2009

On Fri, 30 Oct 2009 03:00:27 pm Raymond Hettinger wrote:

[skipping to the last paragraph]

> Sorry for so many questions

Don't be sorry. These are good questions, and I'll try to answer them. 
But keep in mind that this isn't my proposal -- I vary between +0 and 
+1 on the proposal depending on the time of the day *wink*

> >(1) it will only fail if the set is empty;
>
> Just like next() except that next() gives you the option to supply a
> default and can be used on any iterator (perhaps iter(s) or
> itertools.cycle(s) etc).

Yes. I had considered suggesting that sets become their own iterator, so 
you could do the following:

>>> s = set([1,2])
>>> next(s)
2

but that leads to the problem that if you accept a set from somewhere 
else, you have no idea whether next(s) will succeed or raise 
StopIteration. It may have already been exhausted before it reached 
you.

> > (2) it should be efficient;
>
> Is this about optimization?

Not primarily. Perhaps I should have said it should not be inefficient. 
It's primarily about there being One Obvious Way to extract an 
arbitrary item from a set -- this is a reoccurring question on 
comp.lang.python. Being efficient is a bonus, but it shouldn't be 
inefficient.

> I wouldn't expect "x=s.get()" to beat "for x in s: break".
> Attribute lookup and method calls usually are slower
> than equivalents using built-in syntax with specific opcodes.

Obviously any hypothetical solution would need to be benchmarked, but I 
wouldn't be concerned if s.get() was marginally slower than for `x in 
s: break`. 

When people are left to come up with their own ad hoc solutions, we've 
seen some poor solutions. I've seen, possibly in this very thread, the 
following O(N) suggestions:

for x in s:
    pass

x = list(s)[0]

The usual technique people tend to come up with is:

x = s.pop()
s.add(x)

Which strikes many people (including myself) as inelegant. Surely "get 
an element" is a more fundamental operation than "get an element and 
remove it"?

> > (3) if you call it repeatedly on a set without modifying the set,
> > you will cycle through each element in turn in some unspecified
> > arbitrary order.
>
> What's wrong with using next()?  That is what it's for.

You can't call next() on a set. You have to call next(iter(set)). From 
Python 3.0:

>>> next(set([1,2]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: set object is not an iterator
>>> next(iter(set([1,2])))
1

Unless I missed something, this is so unobvious that nobody has 
suggested it in the thread yet.

> What about this proposal is specific to sets, i.e. why don't you want
> the same thing for lists. tuples, strings, file objects, or any other
> iterable?

Sequences such as lists, tuples and strings have easy random access. 
It's easy to get an arbitrary element: pick an index i by whatever 
method you like, and get seq[i]. Many file objects are similar, you 
have random access with seek().

It is unpractical and an over-generalisation to apply this to general 
iterables, for reasons I'm sure I don't need to go into. But sets and 
frozensets aren't lazily generated streams, they actually do store the 
elements inside their data structure in the same way that lists do.

> Does this proposal pass the test of being self-descriptive?  Can you
> write a code fragment that exercises the cycling behavior, show it to
> another programmer, and have them correctly deduce what the code does
> (i.e. that different values are returned, that it fails when the set
> it empty, that it wraps around and never terminates)?  Can they
> readily differentiate it for dict.get() which has decidedly different
> semantics?

I don't expect ConfigParser.get() to have the same semantics as 
dict.get(), and they're much more closely related than sets and dicts. 
I don't understand why so many people apparently have a problem with 
the name get(), but that's okay, I'm not wedded to the idea either. If 
folks prefer a different name, by all means suggest it. I like the name 
suggested by Wikipedia, "pick".

As for being self-descriptive, I don't know, I haven't tried the 
experiment.

> > To clarify point 3, given:
> >
> > x = set.get()
> > y = set.get()
> >
> > then x and y will only be the same element if set has length one.
>
> So, it can't even be used for looping through a set because there is
> no termination?

That's a feature, not a bug! This is not intended to be a replacement 
for standard set iteration. Anyone writing the following:

for _ in len(s):
    process(s.get())

instead of 

for x in s:
    process(x)

will be taken out and slapped with a large fish.

My reasoning for the given behaviour is as follows:

* If you want the same element twice from a set, the One Obvious Way is 
to get an element from the set (somehow!) and assign it to a local 
variable:

    x = s.get()
    process(x)
    # later...
    process(x)

So having get() return the same value each time is not useful or 
necessary.

* If you want a random element on each call, the One Obvious Way is to 
call random.choice(list(s)).

* Since sets aren't mappings, or sequences, there's no sensible way to 
look up an index or key, so any variation of "get element #1" are out.

* Which leaves returning the elements in some unspecified arbitrary 
order. The most obvious arbitrary order is to cycle through them, which 
conveniently is exactly what iter(set) does, but we don't need to 
guarantee that order.

> > I believe that the patch supplied by Willi Richart implemented
> > these behaviours.
> >
> > http://bugs.python.org/issue7212
>
> So you want to introduce additional, hidden state to sets? (to make
> sure that successive invocations return different values)

If you can think of any other way to efficiently cycle over the elements 
in a set, I'm all for it :)

Presumably this is no different from what dict views do, except they 
don't wrap around when exhausted.

> Do you want a thread local version too? (so that two threads can call
> gets without stomping on each other's guarantees that successive
> calls will produce distinct elements)

I think that's overkill. I wouldn't think we need to guarantee that two 
threads don't see the same element, or that each thread will see each 
element in the same order. We need only the much weaker guarantee that 
two subsequent calls to get() in the one thread with no modification to 
the set between the calls won't retrieve the same element each time 
(unless there is only one element to retrieve), and that each element 
will eventually be seen.

> Do you have any real-world use-cases where next(), for-loops, or
> itertools wouldn't suffice?

Someone in this thread -- forgive me, I don't recall who -- had the case 
where they had a frozenset which they knew only had one element, but no 
obvious way to retrieve that element.

To my mind, the major rationalisation for set.get() is not that there's 
no way to get a single element from a set, but that there is no 
obvious, easily discoverable way to retrieve a single element.

> Is there a precedent in *any* other language you've ever seen?  (setl
> has an "arb" function but it makes no promises about returning
> different values on consequetive calls; otherwise, I've never seen an
> equivalent in any other set implementation).

I can't say I've seen one in any other languages, but Wikipedia 
lists "pick" as a fundamental set operation:

pick(S): returns an arbitrary element of S.

http://en.wikipedia.org/wiki/Set_(computer_science)

This page claims that Icon has an operator that returns a random element 
of a set:

? set( [1, 2, 3, 4, 5] )

http://stackoverflow.com/questions/124671/picking-a-random-element-from-a-set

but I've never used Icon myself and so can't confirm that.

> Do you think the return-different-values-on-successive-calls
> semantics is self-evident and non-magical as compared to a straight
> for-loop or next(it)?

I'm going to sit on the fence on that and say "maybe".

> ISTM, that when streams have non-destructive getters with
> self-advancing pointers, they also have a seek() function so that it
> can be controlled.  Will this proposal need a seek() method too?

No. Since sets are unordered, there's no way to seek to a specific 
element.

> Sorry for so many questions, but I honestly think there are too many
> unresolved design issues.  We've seen no real-world source code that
> would be improved fwith the proposal.  I think it sounds conceptually
> tempting and is fun to theorize about, but it actual implementation
> it will make sets more difficult to learn and it would quickly become
> a piece of rarely used, poorly understood cruft.

-- 
Steven D'Aprano