[Python-Dev] Retrieve an arbitrary element from a setwithoutremoving it
Steven D'Aprano
steve at pearwood.info
Fri Oct 30 16:11:32 CET 2009
On Fri, 30 Oct 2009 03:00:27 pm Raymond Hettinger wrote:
[skipping to the last paragraph]
> Sorry for so many questions
Don't be sorry. These are good questions, and I'll try to answer them.
But keep in mind that this isn't my proposal -- I vary between +0 and
+1 on the proposal depending on the time of the day *wink*
> >(1) it will only fail if the set is empty;
>
> Just like next() except that next() gives you the option to supply a
> default and can be used on any iterator (perhaps iter(s) or
> itertools.cycle(s) etc).
Yes. I had considered suggesting that sets become their own iterator, so
you could do the following:
>>> s = set([1,2])
>>> next(s)
2
but that leads to the problem that if you accept a set from somewhere
else, you have no idea whether next(s) will succeed or raise
StopIteration. It may have already been exhausted before it reached
you.
> > (2) it should be efficient;
>
> Is this about optimization?
Not primarily. Perhaps I should have said it should not be inefficient.
It's primarily about there being One Obvious Way to extract an
arbitrary item from a set -- this is a reoccurring question on
comp.lang.python. Being efficient is a bonus, but it shouldn't be
inefficient.
> I wouldn't expect "x=s.get()" to beat "for x in s: break".
> Attribute lookup and method calls usually are slower
> than equivalents using built-in syntax with specific opcodes.
Obviously any hypothetical solution would need to be benchmarked, but I
wouldn't be concerned if s.get() was marginally slower than for `x in
s: break`.
When people are left to come up with their own ad hoc solutions, we've
seen some poor solutions. I've seen, possibly in this very thread, the
following O(N) suggestions:
for x in s:
pass
x = list(s)[0]
The usual technique people tend to come up with is:
x = s.pop()
s.add(x)
Which strikes many people (including myself) as inelegant. Surely "get
an element" is a more fundamental operation than "get an element and
remove it"?
> > (3) if you call it repeatedly on a set without modifying the set,
> > you will cycle through each element in turn in some unspecified
> > arbitrary order.
>
> What's wrong with using next()? That is what it's for.
You can't call next() on a set. You have to call next(iter(set)). From
Python 3.0:
>>> next(set([1,2]))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: set object is not an iterator
>>> next(iter(set([1,2])))
1
Unless I missed something, this is so unobvious that nobody has
suggested it in the thread yet.
> What about this proposal is specific to sets, i.e. why don't you want
> the same thing for lists. tuples, strings, file objects, or any other
> iterable?
Sequences such as lists, tuples and strings have easy random access.
It's easy to get an arbitrary element: pick an index i by whatever
method you like, and get seq[i]. Many file objects are similar, you
have random access with seek().
It is unpractical and an over-generalisation to apply this to general
iterables, for reasons I'm sure I don't need to go into. But sets and
frozensets aren't lazily generated streams, they actually do store the
elements inside their data structure in the same way that lists do.
> Does this proposal pass the test of being self-descriptive? Can you
> write a code fragment that exercises the cycling behavior, show it to
> another programmer, and have them correctly deduce what the code does
> (i.e. that different values are returned, that it fails when the set
> it empty, that it wraps around and never terminates)? Can they
> readily differentiate it for dict.get() which has decidedly different
> semantics?
I don't expect ConfigParser.get() to have the same semantics as
dict.get(), and they're much more closely related than sets and dicts.
I don't understand why so many people apparently have a problem with
the name get(), but that's okay, I'm not wedded to the idea either. If
folks prefer a different name, by all means suggest it. I like the name
suggested by Wikipedia, "pick".
As for being self-descriptive, I don't know, I haven't tried the
experiment.
> > To clarify point 3, given:
> >
> > x = set.get()
> > y = set.get()
> >
> > then x and y will only be the same element if set has length one.
>
> So, it can't even be used for looping through a set because there is
> no termination?
That's a feature, not a bug! This is not intended to be a replacement
for standard set iteration. Anyone writing the following:
for _ in len(s):
process(s.get())
instead of
for x in s:
process(x)
will be taken out and slapped with a large fish.
My reasoning for the given behaviour is as follows:
* If you want the same element twice from a set, the One Obvious Way is
to get an element from the set (somehow!) and assign it to a local
variable:
x = s.get()
process(x)
# later...
process(x)
So having get() return the same value each time is not useful or
necessary.
* If you want a random element on each call, the One Obvious Way is to
call random.choice(list(s)).
* Since sets aren't mappings, or sequences, there's no sensible way to
look up an index or key, so any variation of "get element #1" are out.
* Which leaves returning the elements in some unspecified arbitrary
order. The most obvious arbitrary order is to cycle through them, which
conveniently is exactly what iter(set) does, but we don't need to
guarantee that order.
> > I believe that the patch supplied by Willi Richart implemented
> > these behaviours.
> >
> > http://bugs.python.org/issue7212
>
> So you want to introduce additional, hidden state to sets? (to make
> sure that successive invocations return different values)
If you can think of any other way to efficiently cycle over the elements
in a set, I'm all for it :)
Presumably this is no different from what dict views do, except they
don't wrap around when exhausted.
> Do you want a thread local version too? (so that two threads can call
> gets without stomping on each other's guarantees that successive
> calls will produce distinct elements)
I think that's overkill. I wouldn't think we need to guarantee that two
threads don't see the same element, or that each thread will see each
element in the same order. We need only the much weaker guarantee that
two subsequent calls to get() in the one thread with no modification to
the set between the calls won't retrieve the same element each time
(unless there is only one element to retrieve), and that each element
will eventually be seen.
> Do you have any real-world use-cases where next(), for-loops, or
> itertools wouldn't suffice?
Someone in this thread -- forgive me, I don't recall who -- had the case
where they had a frozenset which they knew only had one element, but no
obvious way to retrieve that element.
To my mind, the major rationalisation for set.get() is not that there's
no way to get a single element from a set, but that there is no
obvious, easily discoverable way to retrieve a single element.
> Is there a precedent in *any* other language you've ever seen? (setl
> has an "arb" function but it makes no promises about returning
> different values on consequetive calls; otherwise, I've never seen an
> equivalent in any other set implementation).
I can't say I've seen one in any other languages, but Wikipedia
lists "pick" as a fundamental set operation:
pick(S): returns an arbitrary element of S.
http://en.wikipedia.org/wiki/Set_(computer_science)
This page claims that Icon has an operator that returns a random element
of a set:
? set( [1, 2, 3, 4, 5] )
http://stackoverflow.com/questions/124671/picking-a-random-element-from-a-set
but I've never used Icon myself and so can't confirm that.
> Do you think the return-different-values-on-successive-calls
> semantics is self-evident and non-magical as compared to a straight
> for-loop or next(it)?
I'm going to sit on the fence on that and say "maybe".
> ISTM, that when streams have non-destructive getters with
> self-advancing pointers, they also have a seek() function so that it
> can be controlled. Will this proposal need a seek() method too?
No. Since sets are unordered, there's no way to seek to a specific
element.
> Sorry for so many questions, but I honestly think there are too many
> unresolved design issues. We've seen no real-world source code that
> would be improved fwith the proposal. I think it sounds conceptually
> tempting and is fun to theorize about, but it actual implementation
> it will make sets more difficult to learn and it would quickly become
> a piece of rarely used, poorly understood cruft.
--
Steven D'Aprano
More information about the Python-Dev
mailing list