On Fri, 30 Oct 2009 03:00:27 pm Raymond Hettinger wrote:
[skipping to the last paragraph]
Sorry for so many questions
Don't be sorry. These are good questions, and I'll try to answer them. But keep in mind that this isn't my proposal -- I vary between +0 and +1 on the proposal depending on the time of the day *wink*
(1) it will only fail if the set is empty;
Just like next() except that next() gives you the option to supply a default and can be used on any iterator (perhaps iter(s) or itertools.cycle(s) etc).
Yes. I had considered suggesting that sets become their own iterator, so you could do the following:
s = set([1,2]) next(s)
but that leads to the problem that if you accept a set from somewhere else, you have no idea whether next(s) will succeed or raise StopIteration. It may have already been exhausted before it reached you.
(2) it should be efficient;
Is this about optimization?
Not primarily. Perhaps I should have said it should not be inefficient. It's primarily about there being One Obvious Way to extract an arbitrary item from a set -- this is a reoccurring question on comp.lang.python. Being efficient is a bonus, but it shouldn't be inefficient.
I wouldn't expect "x=s.get()" to beat "for x in s: break". Attribute lookup and method calls usually are slower than equivalents using built-in syntax with specific opcodes.
Obviously any hypothetical solution would need to be benchmarked, but I wouldn't be concerned if s.get() was marginally slower than for `x in s: break`.
When people are left to come up with their own ad hoc solutions, we've seen some poor solutions. I've seen, possibly in this very thread, the following O(N) suggestions:
for x in s: pass
x = list(s)
The usual technique people tend to come up with is:
x = s.pop() s.add(x)
Which strikes many people (including myself) as inelegant. Surely "get an element" is a more fundamental operation than "get an element and remove it"?
(3) if you call it repeatedly on a set without modifying the set, you will cycle through each element in turn in some unspecified arbitrary order.
What's wrong with using next()? That is what it's for.
You can't call next() on a set. You have to call next(iter(set)). From Python 3.0:
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: set object is not an iterator
Unless I missed something, this is so unobvious that nobody has suggested it in the thread yet.
What about this proposal is specific to sets, i.e. why don't you want the same thing for lists. tuples, strings, file objects, or any other iterable?
Sequences such as lists, tuples and strings have easy random access. It's easy to get an arbitrary element: pick an index i by whatever method you like, and get seq[i]. Many file objects are similar, you have random access with seek().
It is unpractical and an over-generalisation to apply this to general iterables, for reasons I'm sure I don't need to go into. But sets and frozensets aren't lazily generated streams, they actually do store the elements inside their data structure in the same way that lists do.
Does this proposal pass the test of being self-descriptive? Can you write a code fragment that exercises the cycling behavior, show it to another programmer, and have them correctly deduce what the code does (i.e. that different values are returned, that it fails when the set it empty, that it wraps around and never terminates)? Can they readily differentiate it for dict.get() which has decidedly different semantics?
I don't expect ConfigParser.get() to have the same semantics as dict.get(), and they're much more closely related than sets and dicts. I don't understand why so many people apparently have a problem with the name get(), but that's okay, I'm not wedded to the idea either. If folks prefer a different name, by all means suggest it. I like the name suggested by Wikipedia, "pick".
As for being self-descriptive, I don't know, I haven't tried the experiment.
To clarify point 3, given:
x = set.get() y = set.get()
then x and y will only be the same element if set has length one.
So, it can't even be used for looping through a set because there is no termination?
That's a feature, not a bug! This is not intended to be a replacement for standard set iteration. Anyone writing the following:
for _ in len(s): process(s.get())
for x in s: process(x)
will be taken out and slapped with a large fish.
My reasoning for the given behaviour is as follows:
* If you want the same element twice from a set, the One Obvious Way is to get an element from the set (somehow!) and assign it to a local variable:
x = s.get() process(x) # later... process(x)
So having get() return the same value each time is not useful or necessary.
* If you want a random element on each call, the One Obvious Way is to call random.choice(list(s)).
* Since sets aren't mappings, or sequences, there's no sensible way to look up an index or key, so any variation of "get element #1" are out.
* Which leaves returning the elements in some unspecified arbitrary order. The most obvious arbitrary order is to cycle through them, which conveniently is exactly what iter(set) does, but we don't need to guarantee that order.
I believe that the patch supplied by Willi Richart implemented these behaviours.
So you want to introduce additional, hidden state to sets? (to make sure that successive invocations return different values)
If you can think of any other way to efficiently cycle over the elements in a set, I'm all for it :)
Presumably this is no different from what dict views do, except they don't wrap around when exhausted.
Do you want a thread local version too? (so that two threads can call gets without stomping on each other's guarantees that successive calls will produce distinct elements)
I think that's overkill. I wouldn't think we need to guarantee that two threads don't see the same element, or that each thread will see each element in the same order. We need only the much weaker guarantee that two subsequent calls to get() in the one thread with no modification to the set between the calls won't retrieve the same element each time (unless there is only one element to retrieve), and that each element will eventually be seen.
Do you have any real-world use-cases where next(), for-loops, or itertools wouldn't suffice?
Someone in this thread -- forgive me, I don't recall who -- had the case where they had a frozenset which they knew only had one element, but no obvious way to retrieve that element.
To my mind, the major rationalisation for set.get() is not that there's no way to get a single element from a set, but that there is no obvious, easily discoverable way to retrieve a single element.
Is there a precedent in *any* other language you've ever seen? (setl has an "arb" function but it makes no promises about returning different values on consequetive calls; otherwise, I've never seen an equivalent in any other set implementation).
I can't say I've seen one in any other languages, but Wikipedia lists "pick" as a fundamental set operation:
pick(S): returns an arbitrary element of S.
This page claims that Icon has an operator that returns a random element of a set:
? set( [1, 2, 3, 4, 5] )
but I've never used Icon myself and so can't confirm that.
Do you think the return-different-values-on-successive-calls semantics is self-evident and non-magical as compared to a straight for-loop or next(it)?
I'm going to sit on the fence on that and say "maybe".
ISTM, that when streams have non-destructive getters with self-advancing pointers, they also have a seek() function so that it can be controlled. Will this proposal need a seek() method too?
No. Since sets are unordered, there's no way to seek to a specific element.
Sorry for so many questions, but I honestly think there are too many unresolved design issues. We've seen no real-world source code that would be improved fwith the proposal. I think it sounds conceptually tempting and is fun to theorize about, but it actual implementation it will make sets more difficult to learn and it would quickly become a piece of rarely used, poorly understood cruft.