Mailman 3 Retrieve an arbitrary element from a set without removing it - Python-Dev

Retrieve an arbitrary element from a set without removing it

Willi Richert

Oct. 23, 2009

9:32 a.m.

Hi, recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it. Three possibilities came to mind: 1. x = some_set.pop() some_set.add(x) 2. for x in some_set: break 3. x = iter(some_set).next() Of course, the third should be the fastest. It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this? I imagine something like x = some_set.get() or x = some_set.pop(False) and am thinking about providing a patch against setobject.c (preferring the .get() solution being a stripped down pop()). Before, I would like to know whether I have overlooked something or whether this can be done in an already existing way. Thanks, wr

Show replies by date

Steven D'Aprano

October 2009

10:54 a.m.

On Fri, 23 Oct 2009 08:32:45 pm Willi Richert wrote:

...

Hi,

recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it.

Three possibilities came to mind: ... Of course, the third should be the fastest.

If you need one or more items chosen randomly without replacement, use a list: L = list(some_set) x = random.choice(L) If you don't need a randomly chosen item, merely an arbitrary item, you can still use a list but avoid the call to random.choice: x = list(some_set)[0]

...

It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this?

I can see it being useful to iterate over the entire set, without removing anything. I can see it being useful to pop an arbitrary item, and remove it. But I can't see the use for retrieving an arbitrary item, leaving it in the set. What do you use this for?

...

I imagine something like

x = some_set.get()

or

x = some_set.pop(False)

and am thinking about providing a patch against setobject.c (preferring the .get() solution being a stripped down pop()).

-1 on .pop(flag) +0 on .get() -- Steven D'Aprano

Vitor Bosshard

5:04 p.m.

2009/10/23 Willi Richert <w.richert@gmx.net>:

...

Hi,

recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it.

Three possibilities came to mind:

1. x = some_set.pop() some_set.add(x)

2. for x in some_set: break

3. x = iter(some_set).next()

Of course, the third should be the fastest. It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this?

I imagine something like

x = some_set.get()

I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

John Arbash Meinel

5:25 p.m.

New subject: Retrieve an arbitrary element from a set without removing it

Vitor Bosshard wrote:

...

2009/10/23 Willi Richert <w.richert@gmx.net>:

...
Hi,

recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it.

Three possibilities came to mind:

1. x = some_set.pop() some_set.add(x)

2. for x in some_set: break

3. x = iter(some_set).next()

Of course, the third should be the fastest. It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this?

I imagine something like

x = some_set.get()

I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

So in my testing (2) was actually the fastest. I assumed because .next() was a function call overhead, while: for x in some_set: break Was evaluated inline. It probably still has to call PyObject_GetIter, however it doesn't have to create a stack frame for it. This is what "timeit" tells me. All runs are of the form: python -m timeit -s "s = set([10])" ... 0.101us "for x in s: break; x" 0.130us "for x in s: pass; x" 0.234us -s "n = next; i = iter" "x = n(i(s)); x" 0.248us "x = next(iter(s)); x" 0.341us "x = iter(s).next(); x" So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement? Note that all of these are significantly < 1us. So this only matters if it is something you are doing often. I don't know your specific timings, but I would guess that: for x in s: break Is actually going to be faster than your s.get() Primarily because s.get() requires an attribute lookup. I would think your version might be faster for: stat2 = "g = s.get; for i in xrange(100): g()" However, that is still a function call, which may be treated differently by the interpreter than the for:break loop. I certainly suggest you try it and compare. John =:->

Paul Moore

6:13 p.m.

2009/10/23 John Arbash Meinel <john.arbash.meinel@gmail.com>:

...

I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement?

I'd imagine it's actually because it has to call next() a second time and deal with the StopIteration exception - the loop has to end normally, whereas the break form exits prematurely. Paul.

Terry Reedy

7:04 p.m.

John Arbash Meinel wrote:

...

So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement?

for x in s: pass iterates through *all* the elements in s and leaves x bound to the arbritrary *last* one instead of the arbitrary *first* one. For a large set, this would be a lot slower, not just a little. fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose. tjr

John Arbash Meinel

8:11 p.m.

New subject: Retrieve an arbitrary element from a set without removing it

Terry Reedy wrote:

...

John Arbash Meinel wrote:

...
So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement?

for x in s: pass

iterates through *all* the elements in s and leaves x bound to the arbritrary *last* one instead of the arbitrary *first* one. For a large set, this would be a lot slower, not just a little.

fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose.

tjr

The point of my test was that it was a set with a *single* item, and 'break' was 30% faster than 'pass'. Which was surprising. Certainly the difference is huge if there are 10k items in the set. John =:->

Steven D'Aprano

10:44 p.m.

On Sat, 24 Oct 2009 07:11:57 am John Arbash Meinel wrote:

...

The point of my test was that it was a set with a *single* item, and 'break' was 30% faster than 'pass'. Which was surprising.

Not really. See below.

...

Certainly the difference is huge if there are 10k items in the set.

Earlier you suggested that the difference may have been because of a potential "else" statement. That won't be it: if there's no "else" in the code, it's not compiled in:

...

...
...
import dis dis.dis(compile("for x in s: break", "", "exec")) 1 0 SETUP_LOOP 15 (to 18) 3 LOAD_NAME 0 (s) 6 GET_ITER >> 7 FOR_ITER 7 (to 17) 10 STORE_NAME 1 (x) 13 BREAK_LOOP 14 JUMP_ABSOLUTE 7 >> 17 POP_BLOCK >> 18 LOAD_CONST 0 (None) 21 RETURN_VALUE

dis.dis(compile("for x in s: pass", "", "exec")) 1 0 SETUP_LOOP 14 (to 17) 3 LOAD_NAME 0 (s) 6 GET_ITER >> 7 FOR_ITER 6 (to 16) 10 STORE_NAME 1 (x) 13 JUMP_ABSOLUTE 7 >> 16 POP_BLOCK >> 17 LOAD_CONST 0 (None) 20 RETURN_VALUE

The difference is likely to be this: for x in s: break retrieves the first (only) element of the set, then immediately breaks out of the loop. This is very different from: for x in s: pass which retrieves the first element of the set, then tries to retrieve a second element, which fails and raises StopIteration, which is then caught, ending the loop. That's much more expensive. -- Steven D'Aprano

Raymond Hettinger

7:29 a.m.

[John Arbash Meine]

...

So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()).

The first version uses direct calls for the for-loop opcodes. The other two have to do functions/method look-up and make a pure python function call (neither is very fast). FWIW, a "choose" method had been previously proposed, discussed and rejected. I don't find the optimization issue to be very interesting in the case of retrieving an arbitrary element. This isn't the kind of thing that typically appears in an inner-loop; afterall, if you've retrieved an arbitrary element without removing it, then successive calls to "choose" could potentially retrieve the exact same element again and again. Raymond

Willi Richert

8:22 a.m.

Hi, I totally agree regarding the efficiency. Code that relies on a fast "non- removing .pop()" probably has other worse bottlenecks that should be targetted first. This would, however, relief every programmer who needs this the first time in his Python experience, to research how this could be most reasonably done. And it is more of a style question. I find the "for x in some_set: break" rather ugly. wr Am Montag, 26. Oktober 2009 08:29:36 schrieben Sie:

...

I don't find the optimization issue to be very interesting in the case of retrieving an arbitrary element. This isn't the kind of thing that typically appears in an inner-loop; afterall, if you've retrieved an arbitrary element without removing it, then successive calls to "choose" could potentially retrieve the exact same element again and again.

Raymond

Steven D'Aprano

10:46 p.m.

On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:

...

John Arbash Meinel wrote:

...
So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement?

for x in s: pass

iterates through *all* the elements in s and leaves x bound to the arbritrary *last* one instead of the arbitrary *first* one. For a large set, this would be a lot slower, not just a little.

fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose.

And yet it keeps coming up, again and again... obviously people using sets in code think it has a use-case. I did ask earlier for a use-case, and the OP hasn't replied, but Vitor Bosshard did point out one of his use-cases: he had the difference between two frozensets, which he knew had only one element, but due to the lack of pop() he had no straightforward way of finding out what that element was. The lack of get() in sets and frozensets is sounding more and more to me like the victory of purity over practicality. -- Steven D'Aprano

Ben Finney

11:26 p.m.

Steven D'Aprano <steve@pearwood.info> writes:

...

On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:

...
fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose.

And yet it keeps coming up, again and again... obviously people using sets in code think it has a use-case.

The desire for this may be obvious, but what seems to be lacking is One Obvious Way to Do It. I agree that ‘for x in foo_set: break’ is not an Obvious Way.

...

The lack of get() in sets and frozensets is sounding more and more to me like the victory of purity over practicality.

What would be the input to ‘set.get’? If it's the value, that makes it rather non-obvious; if I already know about ‘dict.get’ which takes the key, I'm not going to be thinking about the ‘get’ method if I don't have a key to feed it. Once I learn it, I'm going to forget it easily, because it's inconsistent with ‘dict.get’. So I don't think that meets the “obvious way” criterion. By analogy with ‘list.pop’, the method that takes the “top” item off the “stack”, I would expect to see ‘list.peek’ and ‘set.peek’, to see the item without altering the container. -- \ “Odious ideas are not entitled to hide from criticism behind | `\ the human shield of their believers' feelings.” —Richard | _o__) Stallman | Ben Finney

Steven D'Aprano

2:07 a.m.

On Sat, 24 Oct 2009 10:26:27 am Ben Finney wrote:

...

Steven D'Aprano <steve@pearwood.info> writes:

...
On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote:

...
fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose.

And yet it keeps coming up, again and again... obviously people using sets in code think it has a use-case.

The desire for this may be obvious, but what seems to be lacking is One Obvious Way to Do It.

I agree that ‘for x in foo_set: break’ is not an Obvious Way.

...
The lack of get() in sets and frozensets is sounding more and more to me like the victory of purity over practicality.

What would be the input to ‘set.get’?

It wouldn't take any input.

...

If it's the value, that makes it rather non-obvious;

It makes it pointless if it takes the value. If you already have the value, why would you need to retrieve it from the set?

...

if I already know about ‘dict.get’ which takes the key, I'm not going to be thinking about the ‘get’ method if I don't have a key to feed it. Once I learn it, I'm going to forget it easily, because it's inconsistent with ‘dict.get’. So I don't think that meets the “obvious way” criterion.

"get" is such a generic term that I don't believe that is a problem. There are already quite a number of get() methods in the 2.6 std lib: $ grep "def get[(]" *.py _abcoll.py: def get(self, key, default=None): ConfigParser.py: def get(self, section, option): ConfigParser.py: def get(self, section, option, raw=False, vars=None): doctest.py: def get(self): mailbox.py: def get(self, key, default=None): os.py: def get(self, key, failobj=None): pickle.py: def get(self, i, pack=struct.pack): Queue.py: def get(self, block=True, timeout=None): shelve.py: def get(self, key, default=None): sre_parse.py: def get(self): threading.py: def get(self): UserDict.py: def get(self, key, failobj=None): UserDict.py: def get(self, key, default=None): weakref.py: def get(self, key, default=None): weakref.py: def get(self, key, default=None): webbrowser.py:def get(using=None): I think you over-estimate the degree of confusion people suffer from similar sounding methods. I don't think people are confused that dict[key] and list[index] have different semantics, and I don't see why dict.get(key[, default]) and set.get() would be any different. But if you don't like get(), spell it differently. There's plenty of opportunity for bike-shedding: getitem() getelement() get_arbitrary_element() peek() etc.

...

By analogy with ‘list.pop’, the method that takes the “top” item off the “stack”, I would expect to see ‘list.peek’ and ‘set.peek’, to see the item without altering the container.

You don't need list.peek() because there already is an obvious way to retrieve an item from a list without removing it: list[index]. -- Steven D'Aprano

Ben Finney

3:02 a.m.

Steven D'Aprano <steve@pearwood.info> writes:

...

On Sat, 24 Oct 2009 10:26:27 am Ben Finney wrote:

...
Steven D'Aprano <steve@pearwood.info> writes:

...
The lack of get() in sets and frozensets is sounding more and more to me like the victory of purity over practicality.

What would be the input to ‘set.get’?

It wouldn't take any input.

That is even less obvious. I would expect a parameter-less ‘set.get’ to get the set. Not terribly useful, but the name and function signature doesn't suggest anything else.

...

"get" is such a generic term that I don't believe that is a problem.

The problem above is made less problematic by the fact that the function signature (e.g. ‘foo_dict.get(key)’) clarifies the answer to the question “get what?”. Whereas ‘foo_set.get()’ doesn't communicate much at all to the reader. If we want a method that gets one item from a set, perhaps the name can make it clearer: name it ‘set.getitem’. But which item should it get? The ‘__getitem__’ special method of lists and dictionaries requires an index or key as parameter. -- \ “Roll dice!” “Why?” “Shut up! I don't need your fucking | `\ *input*, I need you to roll dice!” —Luke Crane, demonstrating | _o__) his refined approach to play testing, 2009 | Ben Finney

Steven D'Aprano

3:38 a.m.

On Sat, 24 Oct 2009 02:02:48 pm Ben Finney wrote:

...

Steven D'Aprano <steve@pearwood.info> writes:

...
On Sat, 24 Oct 2009 10:26:27 am Ben Finney wrote:

...
Steven D'Aprano <steve@pearwood.info> writes:

...
The lack of get() in sets and frozensets is sounding more and more to me like the victory of purity over practicality.

What would be the input to ‘set.get’?

It wouldn't take any input.

That is even less obvious. I would expect a parameter-less ‘set.get’ to get the set. Not terribly useful, but the name and function signature doesn't suggest anything else.

You already have the set. Why would you need a method that you call that returns the set you already have? A method called "get" obviously gets something, and if it takes no arguments the only thing is has access to is the set. The obvious things it could get are the set as a whole or some part of the set. Since getting the set as a whole would be pointless, and we're allowed to assume that the language designers aren't lunatics who would waste time creating pointless methods, the obvious answer is that it gets some part of the set. Since there's no obvious reason to choose one subset over another subset, the only useful thing it could return is a single arbitrary item. But if you're not willing to guess what it gets, you are permitted to read the docstring.

...

...
"get" is such a generic term that I don't believe that is a problem.

The problem above is made less problematic by the fact that the function signature (e.g. ‘foo_dict.get(key)’) clarifies the answer to the question “get what?”. Whereas ‘foo_set.get()’ doesn't communicate much at all to the reader.

You are demanding a level of intuitiveness that few, if any, functions in the standard library would be able to meet. Consider list.index(x): would you expect it to return the value at index x, or the index of value x? Either description is compatible with the name, and in fact people sometimes mix them up. Or sorted() -- from the name alone, I'd expect this to work, but it doesn't:

...

...
...
sorted(1, 2, 3) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: 'int' object is not iterable

That's not a problem with the built-in function. I took a guess about the API, and guessed wrong. I'll learn from this, and get it right next time.

...

If we want a method that gets one item from a set, perhaps the name can make it clearer: name it ‘set.getitem’. But which item should it get? The ‘__getitem__’ special method of lists and dictionaries requires an index or key as parameter.

Neither of which is appropriate for sets -- sets are unordered, so elements don't have indexes, and they don't map keys to values. They just have elements. Sets are not lists, and they're not dicts. Analogies only go so far, and you can't reason about set methods by considering how lists or dicts will behave. -- Steven D'Aprano

Ben Finney

4:06 a.m.

Steven D'Aprano <steve@pearwood.info> writes:

...

On Sat, 24 Oct 2009 02:02:48 pm Ben Finney wrote:

...
I would expect a parameter-less ‘set.get’ to get the set. Not terribly useful, but the name and function signature doesn't suggest anything else.

You already have the set. Why would you need a method that you call that returns the set you already have?

Exactly, hence the confusion. I think the method name ‘set.get’ is poor for that reason.

...

A method called "get" obviously gets something, and if it takes no arguments the only thing is has access to is the set. The obvious things it could get are the set as a whole or some part of the set.

Which then raises the question “what part of the set does it get?”, which the function signature does nothing to answer. I'm proposing that a no-parameters ‘set.get’ is needlessly confusing to think about.

...

Since there's no obvious reason to choose one subset over another subset, the only useful thing it could return is a single arbitrary item.

That's not at all obvious, IMO. The name doesn't give much help at all in getting to that conclusion, and isn't easily associated with that connotation.

...

You are demanding a level of intuitiveness that few, if any, functions in the standard library would be able to meet.

I'm not demanding intuitiveness; all programming interfaces are learned. I'm asking for ease of learning and later recall.

...

That's not a problem with the built-in function. I took a guess about the API, and guessed wrong. I'll learn from this, and get it right next time.

Which is partly my point. Such a narrow use case (“get an arbitrary item from the set, without specifying anything about what that item might be”) is a mismatch for such a generic name. It makes it unnecessarily difficult to make the association. Since the use case is so specific, I would expect the name to be specific too, to better match the use case. (Since I know how your mind works, I can anticipate facetious fifty-character-long names bubbling up already in a response from you. Let's just assume that joke is already done :-) -- \ “Holy tintinnabulation, Batman!” —Robin | `\ | _o__) | Ben Finney

Nick Coghlan

7:19 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

Ben Finney wrote:

...

Which then raises the question “what part of the set does it get?”, which the function signature does nothing to answer. I'm proposing that a no-parameters ‘set.get’ is needlessly confusing to think about.

The fact that set.get() is just set.pop() without removing the result from the set seems perfectly straightforward.

...

Since the use case is so specific, I would expect the name to be specific too, to better match the use case.

The use case is no more specific than set.pop(). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Martin (gzlist)

9:12 a.m.

On 24/10/2009, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Ben Finney wrote:

...
Which then raises the question “what part of the set does it get?”, which the function signature does nothing to answer. I'm proposing that a no-parameters ‘set.get’ is needlessly confusing to think about.

The fact that set.get() is just set.pop() without removing the result from the set seems perfectly straightforward.

There's a different proposed meaning for `set.get` that's been discussed on python-dev before: <http://mail.python.org/pipermail/python-dev/2009-April/088128.html> That one I've had cause for before and no clean and fast way of writing, this one I've always just done the for/break thing. Martin

Steven D'Aprano

3:18 p.m.

On Sat, 24 Oct 2009 08:12:26 pm Martin (gzlist) wrote:

...

On 24/10/2009, Nick Coghlan <ncoghlan@gmail.com> wrote:

...
Ben Finney wrote:

...
Which then raises the question “what part of the set does it get?”, which the function signature does nothing to answer. I'm proposing that a no-parameters ‘set.get’ is needlessly confusing to think about.

The fact that set.get() is just set.pop() without removing the result from the set seems perfectly straightforward.

There's a different proposed meaning for `set.get` that's been discussed on python-dev before:

<http://mail.python.org/pipermail/python-dev/2009-April/088128.html>

That thread has an example of a use-case for extracting an item from a set, given the item itself: interning. The current Python idiom for interning is to use a dict: # store the canonical version _cache[item] = item # retrieve the interned version item = _cache[item] It has been argued that such interning is better done by sets rather than dicts, since this will save a pointer per item (plus any additional overhead dicts may have over that of sets). If this argument is accepted, then we want a fast, efficient operation to perform this: def get(self, item): """Return an element of the set equal to item. Useful for retrieving the canonical version of an element and for interning. """ for x in self: if x == item: return x raise ValueError('item not in set') The above is O(N); ideally it should be O(1). Normally we don't care about identity, only equality, so if x and item above are equal we are indifferent about which one we use. But interning is a real example of a use-case where we do care about identity. Arguably it is inefficient and wasteful to use a dict for interning when sets could be just as fast while using less memory. The other proposed semantics for set.get() are to retrieve an arbitrary element. It must be arbitrary, because elements in a set are unordered and unkeyed. This gives us the semantics of pop() without removing the element: def get(self): """Return an arbitrary element of the set without removing it.""" for x in self: return x raise ValueError("empty set") This is a frequent request, but I still don't know what the use-case is. If you'll excuse me stating the obvious, both of these can easily be written as helper-functions. The main arguments for making them methods are: (1) The first one is needlessly O(N) when it could be O(1). (2) The second one is apparently a common need (although I personally have never needed it), forcing people to re-invent the wheel, sometimes incorrectly or inefficiently, e.g.: def get(aset): for element in aset: pass return element -- Steven D'Aprano

Terry Reedy

7:18 p.m.

Steven D'Aprano wrote:

...

On Sat, 24 Oct 2009 08:12:26 pm Martin (gzlist) wrote:

...

...
There's a different proposed meaning for `set.get` that's been discussed on python-dev before:

<http://mail.python.org/pipermail/python-dev/2009-April/088128.html>

For that case, I think the OP was mistaken mistaken to reject using a dict. He had objects with a key and wanted an index based on that key.

...

That thread has an example of a use-case for extracting an item from a set, given the item itself: interning.

The current Python idiom for interning is to use a dict:

# store the canonical version _cache[item] = item

# retrieve the interned version item = _cache[item]

This is an interesting use case as it turns on the difference between mathematics, where value is identity, and informatics, where the two concepts split. The above is meaningless in math, but *is* meaningful in informatics, where different objects can have the same set-membership value. For Python's builtin sets, set-membership 'value' is based on hash comparisons followed by equality comparison, rather than identity. The purpose of this is to make them more like mathematical sets. But because of the value-identity distinction, they are not exactly like mathematical sets. Given that set()s implicitly map equivalence classes (which are not Python objects) to representitive members of that class (which are), then I agree and would even argue that there should be method of retrieving the one representative given any member. (The above assumes that .__eq__ is properly defined for potential members so that they are properly partitioned into equivalence classes. This is not true when Decimals are mixed with other number classes.) A counter-argument is that the implicit mapping is an implementation detail. A bit-mapped set class for a finite range of ints would not have this mapping. So an ABC for sets probably should not include such a method.

...

It has been argued that such interning is better done by sets rather than dicts, since this will save a pointer per item (plus any additional overhead dicts may have over that of sets). If this argument is accepted, then we want a fast, efficient operation to perform this:

def get(self, item): """Return an element of the set equal to item. Useful for retrieving the canonical version of an element and for interning. """ for x in self: if x == item: return x raise ValueError('item not in set')

The above is O(N); ideally it should be O(1).

Normally we don't care about identity, only equality, so if x and item above are equal we are indifferent about which one we use. But interning is a real example of a use-case where we do care about identity. Arguably it is inefficient and wasteful to use a dict for interning when sets could be just as fast while using less memory.

The other proposed semantics for set.get() are to retrieve an arbitrary element. It must be arbitrary, because elements in a set are unordered and unkeyed. This gives us the semantics of pop() without removing the element:

def get(self): """Return an arbitrary element of the set without removing it.""" for x in self: return x raise ValueError("empty set")

This is a frequent request, but I still don't know what the use-case is.

If you'll excuse me stating the obvious, both of these can easily be written as helper-functions. The main arguments for making them methods are:

(1) The first one is needlessly O(N) when it could be O(1).

(2) The second one is apparently a common need (although I personally have never needed it), forcing people to re-invent the wheel, sometimes incorrectly or inefficiently, e.g.:

def get(aset): for element in aset: pass return element

Terry Jan Reedy

Stephen J. Turnbull

6:22 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

Ben Finney writes:

...

Steven D'Aprano <steve@pearwood.info> writes:

...

...
"get" is such a generic term that I don't believe that is a problem.

The problem above is made less problematic by the fact that the function signature (e.g. 'foo_dict.get(key)') clarifies the answer to the question "get what?". Whereas 'foo_set.get()' doesn't communicate much at all to the reader.

I agree. This is precisely why a couple of months ago people were proposing names like ".getany()" for this API. The problem brought up then was that pretty clearly people would then do things like x = foo.getany() y = foo.getany() expecting x and y to be different (ie, interpreting "any" as "random"). Pretty soon there was a whole family of proposals for such methods: .getfirst(), .getlast(), .getrandom(), .getonly(), .... I think it would be better to document the various ways of doing this, and let each program define its own oneliner for the MySet.get() that makes idiomatic sense in its use case.

Alexander Belopolsky

5:34 p.m.

On Fri, Oct 23, 2009 at 6:46 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote: ..

...
fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose.

And yet it keeps coming up, again and again... obviously people using sets in code think it has a use-case.

This reminds me a debate I had with Martin several years ago: http://bugs.python.org/issue1507011 Here is the essence: AB> I disagree with Martin. I think interning is a set AB> operation and it is unfortunate that set API does not AB> support it directly. ML> I disagree with Alexander's last remark in several respects: ML> set is indeed a container, and there is a way to get the ML> elements back (namely, by enumerating them, or picking an ML> arbitrary element for removal). There is no operation to check, ML> on insertion of E, what the the prior element equal to E was ML> (if any); there is only a check to determine whether E is in the ML> set. The operation "give me the member equal but not identical ML> to E" is conceptually a lookup operation; the mathematical set ML> construct has no such operation, and the Python set models ML> it closely. IOW, set is *not* a dict with key==value. To me, however, a set seems to be a container that is a specialization of a dict with values and keys being the same. In this model, a get() method, or even a __getitem__ with s[k] is k, is only natural.

Guido van Rossum

5:55 p.m.

On Sat, Oct 24, 2009 at 10:34 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

...

To me, however, a set seems to be a container that is a specialization of a dict with values and keys being the same.

That's absurd; the mapping provides nothing useful. -- --Guido van Rossum PS. My elbow needs a couple more weeks of rest. Limiting myself to ultra-short emails.

Terry Reedy

7:37 p.m.

Guido van Rossum wrote:

...

On Sat, Oct 24, 2009 at 10:34 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

...
To me, however, a set seems to be a container that is a specialization of a dict with values and keys being the same.

That's absurd; the mapping provides nothing useful.

Given Alexander's premise, I agree with your response. But his premise is wrong. Python's current builtin set class maps abstract equivalence classes to representative members. And this *is* useful. Mapping arbitrary members of such classes to representative members, sometimes called 'canonical', is a standard problem/goal in math. String interning is an application of this idea. Terry Jan Reedy

"Martin v. Löwis"

6:43 p.m.

New subject: Retrieve an arbitrary element from a set without removing it

...

Given Alexander's premise, I agree with your response. But his premise is wrong. Python's current builtin set class maps abstract equivalence classes to representative members. And this *is* useful. Mapping arbitrary members of such classes to representative members, sometimes called 'canonical', is a standard problem/goal in math. String interning is an application of this idea.

Right. However, this is conceptually a function. In some cases (like string interning), it is mutable (over time) and finite (at any point in time). We already have a data type that can perfectly represent mutable finite funcitons, namely dictionaries. And indeed, interning is implemented by a dictionary. What Alexander wants is that the set type also becomes useful for storing canonical representations. I find that inappropriate, since dicts already provide the one obvious way to do it. Regards, Martin

Steven D'Aprano

11:17 p.m.

On Mon, 26 Oct 2009 05:43:52 am Martin v. Löwis wrote:

...

What Alexander wants is that the set type also becomes useful for storing canonical representations. I find that inappropriate, since dicts already provide the one obvious way to do it.

Only because dicts came first in Python rather than sets. There's nothing inherently obvious about storing the canonical representation of an object *twice* (although I'm not Dutch). A more natural representation would be to store the canonical representation once. An abstract intern operation might be written: if s equals an element in the data store return the element which s equals otherwise insert s into the data store and return s Notice we don't say: if s equals an element in the data store return the value associated with the element which s equals which is what the dict-based solution does. We can implement that abstract algorithm using lists. The algorithm is exactly the same as it is for dicts, except that search/retrieval becomes two operations rather than one: _cache = ['x', 'spam', 'parrot'] def intern(s): try: s = _cache[ _cache.index(s) ] except ValueError: _cache.append(s) return s We don't do this because O(N) searching is too expensive. Using a dict {s: s} is a workaround for the lack of a data structure that combines O(1) searching and ability to retrieve the element just found. Memory is cheap, but not so cheap that doubling the size of a data structure (two pointers per element for {s:s} versus one for {s}) is insignificant. Part of the reason we intern in the first place is to save memory. We shouldn't dismiss this out of hand based on the argument that retrieval from a set is insufficiently pure. As soon as you allow iteration over sets, you have allowed retrieval. -- Steven D'Aprano

Antoine Pitrou

6:13 p.m.

New subject: interning

Alexander Belopolsky <alexander.belopolsky <at> gmail.com> writes:

...

AB> I disagree with Martin. I think interning is a set AB> operation and it is unfortunate that set API does not AB> support it directly.

ML> I disagree with Alexander's last remark in several respects:

[...]

...

ML> The operation "give me the member equal but not identical ML> to E" is conceptually a lookup operation; the mathematical set ML> construct has no such operation, and the Python set models ML> it closely. IOW, set is *not* a dict with key==value.

This looks like a debate of purity vs. practicality. I don't have any opinion on a Python-facing API, but the interpreter's dict of interned strings could probably converted to a set as a simple optimization. Regards Antoine.

Terry Reedy

7:37 p.m.

Alexander Belopolsky wrote:

...

On Fri, Oct 23, 2009 at 6:46 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
On Sat, 24 Oct 2009 06:04:12 am Terry Reedy wrote: ..

...
fwiw, I think the use case for this is sufficiently rare that it does not need a separate method just for this purpose.

And yet it keeps coming up, again and again... obviously people using sets in code think it has a use-case.

This reminds me a debate I had with Martin several years ago:

http://bugs.python.org/issue1507011

Here is the essence:

AB> I disagree with Martin. I think interning is a set AB> operation and it is unfortunate that set API does not AB> support it directly.

ML> I disagree with Alexander's last remark in several respects: ML> set is indeed a container, and there is a way to get the ML> elements back (namely, by enumerating them, or picking an ML> arbitrary element for removal). There is no operation to check, ML> on insertion of E, what the the prior element equal to E was ML> (if any); there is only a check to determine whether E is in the ML> set. The operation "give me the member equal but not identical ML> to E" is conceptually a lookup operation; the mathematical set ML> construct has no such operation, and the Python set models ML> it closely. IOW, set is *not* a dict with key==value.

To me, however, a set seems to be a container that is a specialization of a dict with values and keys being the same.

As I explained in response to Steven, set()s *implicitly* map of abstract, non-object equivalence classes to a concrete, representative object/member of that class.

...

In this model, a get() method, or even a __getitem__ with s[k] is k, is only natural.

No, if key and value were the same thing, the get method would be nonesense, as Guido said. But since they are not, since the implict key is an abstract class, retrieving the representative, perhaps canonical object given *any* member *is* natural. Being able to do so is also a standard practice in mathematics. Terry Jan Reedy

Nick Coghlan

5:48 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

Terry Reedy wrote:

...

...
In this model, a get() method, or even a __getitem__ with s[k] is k, is only natural.

No, if key and value were the same thing, the get method would be nonesense, as Guido said. But since they are not, since the implict key is an abstract class, retrieving the representative, perhaps canonical object given *any* member *is* natural. Being able to do so is also a standard practice in mathematics.

To formalise this invariant to some degree: given a set "s" and two items "k1" and "k2", such that "k1 in s" and "k2 in s" and "k1 == k2", there is a single item "k" in s such that "k1 == k" and "k2 == k". If __getitem__ were to be provided by sets, then the last part of that invariant could be written "s[k1] is s[k2]". If you actually care about that aspect of the semantics for a particular application, it would be far clearer to use a full dict() and live with the additional memory usage. While I can see how saving that pointer-per-entry might make sense in some circumstances, for most I would see it as a needlessly confusing micro-optimisation. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

"Martin v. Löwis"

6:45 p.m.

New subject: Retrieve an arbitrary element from a set without removing it

...

If you actually care about that aspect of the semantics for a particular application, it would be far clearer to use a full dict() and live with the additional memory usage. While I can see how saving that pointer-per-entry might make sense in some circumstances, for most I would see it as a needlessly confusing micro-optimisation.

That's exactly my view. Regards, Martin

Alexander Belopolsky

3:04 p.m.

On Sun, Oct 25, 2009 at 1:48 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

Terry Reedy wrote:

...
...
In this model, a get() method, or even a __getitem__ with s[k] is k, is only natural.

No, if key and value were the same thing, the get method would be nonesense, as Guido said. But since they are not, since the implict key is an abstract class, retrieving the representative, perhaps canonical object given *any* member *is* natural. Being able to do so is also a standard practice in mathematics.

To formalise this invariant to some degree: given a set "s" and two items "k1" and "k2", such that "k1 in s" and "k2 in s" and "k1 == k2", there is a single item "k" in s such that "k1 == k" and "k2 == k".

If __getitem__ were to be provided by sets, then the last part of that invariant could be written "s[k1] is s[k2]".

Thanks, Nick and Terry for a much more precise formulation of the point that I was trying to present. When I wrote s[k] is k, I had in mind for k stored is the set or among the keys of an equivalent dict. Formally alltrue(s[k] is k for k in s). Nick's invariant is of course a better expression of the same idea. I believe interning is a worthwhile use case for Python to have "one obvious way to do it." Martin suggests that such a way already exists and it involves storing interned objects as both keys and values in a dictionary. While this may have been obvious before sets became available, I agree with Steven that in post 2.4 python users are likely to look at set first and will only turn to dictionary after discovering that the functionality that they are looking for is not available in set. Even if you know to use a dictionary, there are still some non-obvious tricks to learn about. First, you need to learn about setdefault, a much criticized method that led to a new class being introduced to the standard library: http://mail.python.org/pipermail/python-dev/2003-February/033321.html http://docs.python.org/library/collections.html?highlight=defaultdict#collec... Second, there is no obvious way to pre-seed the dictionary, i.e., given a list l of keys, d = {} for k in l: d[k] = k When I was looking for a dict method to accomplish that, I discovered dict.fromkeys, which of course was not the right method. I still don't know if a better solution exists than calling setdefault(k, k) in a loop. As experience with defaultdict has shown, it may be more appropriate to introduce a specialized and properly named class in collections rather than overloading set with new methods, but such approach would lead to steep learning curve. Collections.defaultdict, could be cross-referenced from dict.setdefault, but a hypothetical collections.interningset would most likely remain undiscovered for the majority of users. Here is an alternative idea on how storing interned objects in a set can be supported. Currently set.add method returns None and had no effect when set already has an object equal to the one being added. I propose to consider changing that behavior to make set.add return the added object or the set member that is equal to the object being added. It is unlikely that many programs rely on the return value being None (with doctests being a probable exception), so adding this feature is unlikely to cause much grief.

Terry Reedy

4:02 p.m.

Alexander Belopolsky wrote:

...

Here is an alternative idea on how storing interned objects in a set can be supported. Currently set.add method returns None and had no effect when set already has an object equal to the one being added. I propose to consider changing that behavior to make set.add return the added object or the set member that is equal to the object being added. It is unlikely that many programs rely on the return value being None (with doctests being a probable exception), so adding this feature is unlikely to cause much grief.

I had exactly the same idea, but did not post because it violates the general rule that mutators return None. On the other hand, the returned value here would not be the mutated collection, so no chaining is possible. And 'add' is clearly intended to change something. On the other hand, frozensets do not have an add method. Terry Jan Reedy

Guido van Rossum

4:38 p.m.

All this talk of equivalence classes makes me dizzy. :-) - If sets were to grow an API to non-destructively access the object stored in the set for a particular key, then dicts should have such a method too. - Ditto for an API to non-destructively get an arbitrary element. - I'm far from convinced that we urgently need either API. But I'm also not convinced it's unneeded. - I still wish we could go back in time and unify sets and dicts, if only to find out how that experiment would turn out. -- --Guido van Rossum PS. My elbow needs a couple more weeks of rest. Limiting myself to ultra-short emails.

Chris Bergstresser

6:54 p.m.

On Mon, Oct 26, 2009 at 11:38 AM, Guido van Rossum <guido@python.org> wrote:

...

- If sets were to grow an API to non-destructively access the object stored in the set for a particular key, then dicts should have such a method too.

- Ditto for an API to non-destructively get an arbitrary element.

- I'm far from convinced that we urgently need either API. But I'm also not convinced it's unneeded.

These clearly aren't urgently needed, but I do think they're needed and useful. For those who want a use-case for getting an arbitrary element from a set, I've run into the need several times over the last year, and each time I'm a little surprised I had the need and a little surprised there wasn't an good way of going about it. In the most recent example, I was writing some UI code. I had a set of all the open window references so I could clean them up at the end of the program. I needed to call some API function that required a window reference as the first argument, but it returned a global value common to all the window references. I like the proposed set.get() method, personally. list.get(index) gets the item at that index, dict.get(key) gets the item associated with that key, set.get() gets an item, but doesn't place any guarantees on which item is returned. Makes sense to me. I also like the idea there aren't any guarantees about which item is returned--it allows subclasses to implement different behavior (so one might always return the last item placed in the set, another could always return a random item, another could implement some round-robin behavior, and all would fulfill the contract of the set class). The existing methods aren't great for accomplishing this, mainly because they're obfuscatory. "iter(s).next()" is probably clearest, and at least will throw a StopIteration exception if the set is empty. "for x in s: break" is just confusing, easy to accidentally confuse with "for x in s: pass", and causes an unrevealing NameError if the set is empty. Add in all the other idioms for accomplishing the same thing ("x, = s", etc.) and I think there's a good argument for adding the method to sets, if only to provide a single obvious way of doing it--and throwing a single, appropriate exception if the set is empty. -- Chris

Raymond Hettinger

7:23 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Chris Bergstresser]

...

I like the proposed set.get() method, personally.

Sets have been implemented in many languages, but I've only seen one that implemented this functionality (the "arb" function in SETL). For the most part, it seems that this is an uncommon need. Also consider that there is value to keeping the set-api as simple as possible. Right now, anyone who was exposed to the basics of sets in school can understand the set-api with a near zero learning curve. Some of that would be lost if you add methods that make identity/equality distinctions or that fetch the same arbitrary value over and over again. Besides, it is trivial to write a short function that encapsulates this behavior if it is part of your personal style of expression.

...

The existing methods aren't great for accomplishing this, mainly because they're obfuscatory. "iter(s).next()" is probably clearest, and at least will throw a StopIteration exception if the set is empty. "for x in s: break" is just confusing ...

A short comment would do wonders. Raymond P.S. It is weird that this thread is gaining traction at the same time as the moratorium thread. Go figure :-)

Guido van Rossum

7:56 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Mon, Oct 26, 2009 at 12:23 PM, Raymond Hettinger <python@rcn.com> wrote:

...

P.S. It is weird that this thread is gaining traction at the same time as the moratorium thread. Go figure :-)

I'm beginning to think maybe adding a method to a built-in object type *should* fall under the moratorium. -- --Guido van Rossum PS. My elbow needs a couple more weeks of rest. Limiting myself to ultra-short emails.

Jesse Noller

8:06 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Mon, Oct 26, 2009 at 3:56 PM, Guido van Rossum <guido@python.org> wrote:

...

On Mon, Oct 26, 2009 at 12:23 PM, Raymond Hettinger <python@rcn.com> wrote:

...
P.S. It is weird that this thread is gaining traction at the same time as the moratorium thread. Go figure :-)

I'm beginning to think maybe adding a method to a built-in object type *should* fall under the moratorium.

So far, fiddling with the PEP, I'm on the fence - adding a method to a built-in object type is sort of a grey area (at least in my mind). It depends on if doing so significantly alters the language/syntax. jesse

Antoine Pitrou

8:19 p.m.

New subject: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

Jesse Noller <jnoller <at> gmail.com> writes:

...

So far, fiddling with the PEP, I'm on the fence - adding a method to a built-in object type is sort of a grey area (at least in my mind). It depends on if doing so significantly alters the language/syntax.

We have recently added things like float.fromhex() which IMHO shouldn't be blocked by the moratorium (*), although they technically add a method to a built-in. (*) it is a minor new feature aimed at catching up with some established standard for an exact, non-ambiguous string representation of floats Regards Antoine.

Guido van Rossum

8:32 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Mon, Oct 26, 2009 at 1:19 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Jesse Noller <jnoller <at> gmail.com> writes:

...
So far, fiddling with the PEP, I'm on the fence - adding a method to a built-in object type is sort of a grey area (at least in my mind). It depends on if doing so significantly alters the language/syntax.

We have recently added things like float.fromhex() which IMHO shouldn't be blocked by the moratorium (*), although they technically add a method to a built-in.

(*) it is a minor new feature aimed at catching up with some established standard for an exact, non-ambiguous string representation of floats

Okay, so it remains a gray area. -- --Guido van Rossum PS. My elbow needs a couple more weeks of rest. Limiting myself to ultra-short emails.

Willi Richert

9:14 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

For those of you who want to tinker with it, I posted the patch against the current trunk at http://bugs.python.org/issue7212 Have fun, wr Am Montag, 26. Oktober 2009 21:32:32 schrieb Guido van Rossum:

...

On Mon, Oct 26, 2009 at 1:19 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
Jesse Noller <jnoller <at> gmail.com> writes:

...
So far, fiddling with the PEP, I'm on the fence - adding a method to a built-in object type is sort of a grey area (at least in my mind). It depends on if doing so significantly alters the language/syntax.

We have recently added things like float.fromhex() which IMHO shouldn't be blocked by the moratorium (*), although they technically add a method to a built-in.

(*) it is a minor new feature aimed at catching up with some established standard for an exact, non-ambiguous string representation of floats

Okay, so it remains a gray area.

geremy condra

8:14 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Mon, Oct 26, 2009 at 3:56 PM, Guido van Rossum <guido@python.org> wrote:

...

On Mon, Oct 26, 2009 at 12:23 PM, Raymond Hettinger <python@rcn.com> wrote:

...
P.S. It is weird that this thread is gaining traction at the same time as the moratorium thread. Go figure :-)

I'm beginning to think maybe adding a method to a built-in object type *should* fall under the moratorium.

-- --Guido van Rossum

Seems like any changes requiring support from uninvolved developers should fall under the moratorium. If the proposal is to add a set.get() function to the set API, then this clearly falls under that heading. Geremy Condra

Mark Dickinson

8:26 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Mon, Oct 26, 2009 at 7:56 PM, Guido van Rossum <guido@python.org> wrote:

...

I'm beginning to think maybe adding a method to a built-in object type *should* fall under the moratorium.

Another possible test case for this decision is the int.from_bytes and int.to_bytes methods, proposed a while ago, and implemented by Alexandre Vassalotti. See: http://bugs.python.org/issue1023290 Mark

Georg Brandl

4:06 p.m.

Chris Bergstresser schrieb:

...

I like the proposed set.get() method, personally. list.get(index) gets the item at that index, dict.get(key) gets the item associated with that key, set.get() gets an item, but doesn't place any guarantees on which item is returned.

Sorry to nitpick, but there is no list.get(). Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Chris Bergstresser

5:33 p.m.

On Tue, Oct 27, 2009 at 11:06 AM, Georg Brandl <g.brandl@gmx.net> wrote:

...

Sorry to nitpick, but there is no list.get().

No? How ... odd. I guess it wouldn't have come up, but I was sure there was a .get method which took an optional default parameter if the index didn't exist, mirroring the dict method. Still, I think my point stands--it's a clear extrapolation from the existing dict.get(). -- Chris

Georg Brandl

5:40 p.m.

Chris Bergstresser schrieb:

...

On Tue, Oct 27, 2009 at 11:06 AM, Georg Brandl <g.brandl@gmx.net> wrote:

...
Sorry to nitpick, but there is no list.get().

No? How ... odd. I guess it wouldn't have come up, but I was sure there was a .get method which took an optional default parameter if the index didn't exist, mirroring the dict method. Still, I think my point stands--it's a clear extrapolation from the existing dict.get().

I don't see that. Both dict.get() and your hypothetical list.get() are variants of the [] subscription operator, i.e. __getitem__, that have a default value, defaulting to None. The [] operator retrieves an element from the object at the specified "position", be it dict key, list index or some other abstract position. Contrary to that, sets don't support subscript access, there is no notion of a value at a specified position, so I would find the set.get() naming confusing. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Raymond Hettinger

5:47 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

[Chris Bergstresser] Still, I think my

...

point stands--it's a clear extrapolation from the existing dict.get().

Not really. One looks-up a key and supplies a default value if not found. The other, set.get(), doesn't have a key to lookup. A dict.get() can be meaningfully used in a loop (because the key can vary). A set.get() returns the same value over and over again (because there is no key). Raymond

Antoine Pitrou

5:59 p.m.

New subject: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

Raymond Hettinger <python <at> rcn.com> writes:

...

[Chris Bergstresser] Still, I think my

...
point stands--it's a clear extrapolation from the existing dict.get().

Not really. One looks-up a key and supplies a default value if not found. The other, set.get(), doesn't have a key to lookup.

set.getone() then ?

geremy condra

6:20 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Oct 27, 2009 at 1:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Raymond Hettinger <python <at> rcn.com> writes:

...
[Chris Bergstresser] Still, I think my

...
point stands--it's a clear extrapolation from the existing dict.get().

Not really. One looks-up a key and supplies a default value if not found. The other, set.get(), doesn't have a key to lookup.

set.getone() then ?

ugh- other spellings much preferred. Geremy Condra

Oleg Broytman

6:30 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Oct 27, 2009 at 02:20:04PM -0400, geremy condra wrote:

...

On Tue, Oct 27, 2009 at 1:59 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...
Raymond Hettinger <python <at> rcn.com> writes: set.getone() then ?

ugh- other spellings much preferred.

set[] ? (Just kidding, really.) Oleg. -- Oleg Broytman http://phd.pp.ru/ phd@phd.pp.ru Programmers don't die, they just GOSUB without RETURN.

Terry Reedy

6:50 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Raymond Hettinger wrote:

...

A dict.get() can be meaningfully used in a loop (because the key can vary). A set.get() returns the same value over and over again (because there is no key).

There are two ideas of set.get floating about: 1) get an arbitrary object 2) get the object in the set with the same 'value'(hash+eq) as an input arg (the intern case). In this case, there is a 'key', even if it is somewhat abstract rather that being an object. Both could be done with the same method, depending on whether an arg is passed or not. The former is not useful in a loop; the latter could be. If there is no match in case 2, the method could a) raise an exception, b) return None (which by its nature would never sensibly be looked up), or c) return an object specified by 'default=xxx' keyword arg. Terry Jan Reedy

ssteinerX＠gmail.com

7:06 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Oct 27, 2009, at 2:50 PM, Terry Reedy wrote more and more and more and more and more and more and more and more and more: This topic needs its own flippin' newsgroup. S

Jesse Noller

7:13 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Oct 27, 2009 at 3:06 PM, ssteinerX@gmail.com <ssteinerx@gmail.com> wrote:

...

On Oct 27, 2009, at 2:50 PM, Terry Reedy wrote more and more and more and more and more and more and more and more and more:

This topic needs its own flippin' newsgroup.

S

Don't like it? Mute the conversation (thank you gmail) or unsub.

Terry Reedy

3:02 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

ssteinerX@gmail.com wrote:

...

On Oct 27, 2009, at 2:50 PM, Terry Reedy wrote more and more and more and more and more and more and more and more and more:

Actually, I wrote 7 succinct lines that summarized and made a proposal. In general, I snip when quoting and write concisely as possible.

...

This topic needs its own flippin' newsgroup.

You need to learn politeness. You could have said just that, appropriate or not, without dumping on anyone in particular. Terry Jan Reedy

ssteinerX＠gmail.com

3:27 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Oct 27, 2009, at 11:02 PM, Terry Reedy wrote:

...

ssteinerX@gmail.com wrote:

...

This topic needs its own flippin' newsgroup. You could have said just that, appropriate or not, without dumping on anyone in particular.

I was not trying to dump on you in particular, I picked a random message of the trillions that went by and happened to get you. I apologize if you felt dumped on. No offense intended to you in particular but this thread has gone on and on and on and on and on and on... S

Guido van Rossum

7:13 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Oct 27, 2009 at 11:50 AM, Terry Reedy <tjreedy@udel.edu> wrote:

...

There are two ideas of set.get floating about: 1) get an arbitrary object 2) get the object in the set with the same 'value'(hash+eq) as an input arg (the intern case). In this case, there is a 'key', even if it is somewhat abstract rather that being an object.

Both could be done with the same method, depending on whether an arg is passed or not.

My gut tells me it is bad API design to collapse these two use cases. Probably because the implementations won't have much in common: (1) should just pick the first valid element, while (2) should use the normal hash lookup algorithm (shared with 'in', .add() etc.). -- --Guido van Rossum (python.org/~guido)

geremy condra

7:27 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Oct 27, 2009 at 3:13 PM, Guido van Rossum <guido@python.org> wrote:

...

On Tue, Oct 27, 2009 at 11:50 AM, Terry Reedy <tjreedy@udel.edu> wrote:

...
There are two ideas of set.get floating about: 1) get an arbitrary object 2) get the object in the set with the same 'value'(hash+eq) as an input arg (the intern case). In this case, there is a 'key', even if it is somewhat abstract rather that being an object.

Both could be done with the same method, depending on whether an arg is passed or not.

My gut tells me it is bad API design to collapse these two use cases. Probably because the implementations won't have much in common: (1) should just pick the first valid element, while (2) should use the normal hash lookup algorithm (shared with 'in', .add() etc.).

-- --Guido van Rossum (python.org/~guido)

Was it ever decided whether this would fall under the moratorium? Geremy Condra

Raymond Hettinger

7:49 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

[geremy condra]

...

Was it ever decided whether this would fall under the moratorium?

Decided isn't the right word: http://mail.python.org/pipermail/python-dev/2009-October/093373.html FWIW, I'm a strong -1 on both proposals. Just add a short get_one() function and a get_equivalent() recipe to your utils directory. That will get the job done (thought I don't expect that you will *ever* make much use of either one). No need to complexify a type that is currently very simple. Raymond P.S. get_equivalent: http://code.activestate.com/recipes/499299/ get_one = lambda s, default=None: next(iter(s), default) The first works with all type that defines __contains__. The second works for any iterable. Neither of these concepts are specific to set objects.

geremy condra

3:34 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Tue, Oct 27, 2009 at 3:49 PM, Raymond Hettinger <python@rcn.com> wrote:

...

[geremy condra]

...
Was it ever decided whether this would fall under the moratorium?

Decided isn't the right word: http://mail.python.org/pipermail/python-dev/2009-October/093373.html

<snip> I'm unclear- does that imply that this is this going to be decided on a case-by-case basis in the future or have the details for the big M just not been hammered out yet? Geremy Condra

Nick Coghlan

6:40 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Guido van Rossum wrote:

...

My gut tells me it is bad API design to collapse these two use cases. Probably because the implementations won't have much in common: (1) should just pick the first valid element, while (2) should use the normal hash lookup algorithm (shared with 'in', .add() etc.).

As a side note, a fairly obvious method name for "add if missing and then return canonical representation" did occur to me: s.intern(x) I'd be -0 on expanding the set API for this though (since the cookbook recipe overloading __eq__ already provides an efficient solution). As far as the moratorium in general goes, perhaps we should draw a line between API changes that affect the ABCs in collections and numbers and new convenience methods on the builtin types themselves? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Chris Bergstresser

7:02 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Oct 27, 2009 at 12:47 PM, Raymond Hettinger <python@rcn.com> wrote:

...

[Chris Bergstresser] Still, I think my

...
point stands--it's a clear extrapolation from the existing dict.get().

Not really. One looks-up a key and supplies a default value if not found. The other, set.get(), doesn't have a key to lookup.

Right, which is why dict.get() takes a key as an argument, while the proposed set.get() wouldn't.

...

A dict.get() can be meaningfully used in a loop (because the key can vary). A set.get() returns the same value over and over again (because there is no key).

I don't think "can it be used meaningfully in a loop?" is an especially interesting or useful way of evaluating language features. Besides, why would set.get() necessarily return the same value over and over again? I thought it would be defined to return an arbitrary value--which could be the same value over and over again, but could just as easily be defined to return a round-robin value, or the minimum value, or some *other* value as the container defined it. The fact is, set.get() succinctly describes an action which is otherwise obscure in Python. It doesn't come up all that frequently, but when it does the alternatives are poor at best. Add in the uncertainty about which is optimized (imagine the situation where the set you're calling is actually a proxy for an object across the network, and constructing an iterator is expensive) and you start to see the value. -- Chris

Steven D'Aprano

2:58 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Wed, 28 Oct 2009 04:47:59 am Raymond Hettinger wrote:

...

A dict.get() can be meaningfully used in a loop (because the key can vary). A set.get() returns the same value over and over again (because there is no key).

I don't believe anyone has requested those semantics. The suggested semantics for set.get() with no arguments, as I understand them, are: (1) it will only fail if the set is empty; (2) it should be efficient; (3) if you call it repeatedly on a set without modifying the set, you will cycle through each element in turn in some unspecified arbitrary order. To clarify point 3, given: x = set.get() y = set.get() then x and y will only be the same element if set has length one. However, given: x = set.get() set.add(el) set.remove(el) y = set.get() there are no guarantees about x and y being different. I believe that the patch supplied by Willi Richart implemented these behaviours. http://bugs.python.org/issue7212 -- Steven D'Aprano

Raymond Hettinger

4 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

[Steven D'Aprano]

...

The suggested semantics for set.get() with no arguments, as I understand them, are:

(1) it will only fail if the set is empty;

Just like next() except that next() gives you the option to supply a default and can be used on any iterator (perhaps iter(s) or itertools.cycle(s) etc).

...

(2) it should be efficient;

Is this about optimization? I wouldn't expect "x=s.get()" to beat "for x in s: break". Attribute lookup and method calls usually are slower than equivalents using built-in syntax with specific opcodes.

...

(3) if you call it repeatedly on a set without modifying the set, you will cycle through each element in turn in some unspecified arbitrary order.

What's wrong with using next()? That is what it's for. What about this proposal is specific to sets, i.e. why don't you want the same thing for lists. tuples, strings, file objects, or any other iterable? Does this proposal pass the test of being self-descriptive? Can you write a code fragment that exercises the cycling behavior, show it to another programmer, and have them correctly deduce what the code does (i.e. that different values are returned, that it fails when the set it empty, that it wraps around and never terminates)? Can they readily differentiate it for dict.get() which has decidedly different semantics?

...

To clarify point 3, given:

x = set.get() y = set.get()

then x and y will only be the same element if set has length one.

So, it can't even be used for looping through a set because there is no termination?

...

I believe that the patch supplied by Willi Richart implemented these behaviours.

http://bugs.python.org/issue7212

So you want to introduce additional, hidden state to sets? (to make sure that successive invocations return different values) Do you want a thread local version too? (so that two threads can call gets without stomping on each other's guarantees that successive calls will produce distinct elements) Do you have any real-world use-cases where next(), for-loops, or itertools wouldn't suffice? Is there a precedent in *any* other language you've ever seen? (setl has an "arb" function but it makes no promises about returning different values on consequetive calls; otherwise, I've never seen an equivalent in any other set implementation). Do you think the return-different-values-on-successive-calls semantics is self-evident and non-magical as compared to a straight for-loop or next(it)? ISTM, that when streams have non-destructive getters with self-advancing pointers, they also have a seek() function so that it can be controlled. Will this proposal need a seek() method too? Sorry for so many questions, but I honestly think there are too many unresolved design issues. We've seen no real-world source code that would be improved fwith the proposal. I think it sounds conceptually tempting and is fun to theorize about, but it actual implementation it will make sets more difficult to learn and it would quickly become a piece of rarely used, poorly understood cruft. Raymond

Steven D'Aprano

3:11 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Fri, 30 Oct 2009 03:00:27 pm Raymond Hettinger wrote: [skipping to the last paragraph]

...

Sorry for so many questions

Don't be sorry. These are good questions, and I'll try to answer them. But keep in mind that this isn't my proposal -- I vary between +0 and +1 on the proposal depending on the time of the day *wink*

...

...
(1) it will only fail if the set is empty;

Just like next() except that next() gives you the option to supply a default and can be used on any iterator (perhaps iter(s) or itertools.cycle(s) etc).

Yes. I had considered suggesting that sets become their own iterator, so you could do the following:

...

...
...
s = set([1,2]) next(s) 2

but that leads to the problem that if you accept a set from somewhere else, you have no idea whether next(s) will succeed or raise StopIteration. It may have already been exhausted before it reached you.

...

...
(2) it should be efficient;

Is this about optimization?

Not primarily. Perhaps I should have said it should not be inefficient. It's primarily about there being One Obvious Way to extract an arbitrary item from a set -- this is a reoccurring question on comp.lang.python. Being efficient is a bonus, but it shouldn't be inefficient.

...

I wouldn't expect "x=s.get()" to beat "for x in s: break". Attribute lookup and method calls usually are slower than equivalents using built-in syntax with specific opcodes.

Obviously any hypothetical solution would need to be benchmarked, but I wouldn't be concerned if s.get() was marginally slower than for `x in s: break`. When people are left to come up with their own ad hoc solutions, we've seen some poor solutions. I've seen, possibly in this very thread, the following O(N) suggestions: for x in s: pass x = list(s)[0] The usual technique people tend to come up with is: x = s.pop() s.add(x) Which strikes many people (including myself) as inelegant. Surely "get an element" is a more fundamental operation than "get an element and remove it"?

...

...
(3) if you call it repeatedly on a set without modifying the set, you will cycle through each element in turn in some unspecified arbitrary order.

What's wrong with using next()? That is what it's for.

You can't call next() on a set. You have to call next(iter(set)). From Python 3.0:

...

...
...
next(set([1,2])) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: set object is not an iterator next(iter(set([1,2]))) 1

Unless I missed something, this is so unobvious that nobody has suggested it in the thread yet.

...

What about this proposal is specific to sets, i.e. why don't you want the same thing for lists. tuples, strings, file objects, or any other iterable?

Sequences such as lists, tuples and strings have easy random access. It's easy to get an arbitrary element: pick an index i by whatever method you like, and get seq[i]. Many file objects are similar, you have random access with seek(). It is unpractical and an over-generalisation to apply this to general iterables, for reasons I'm sure I don't need to go into. But sets and frozensets aren't lazily generated streams, they actually do store the elements inside their data structure in the same way that lists do.

...

Does this proposal pass the test of being self-descriptive? Can you write a code fragment that exercises the cycling behavior, show it to another programmer, and have them correctly deduce what the code does (i.e. that different values are returned, that it fails when the set it empty, that it wraps around and never terminates)? Can they readily differentiate it for dict.get() which has decidedly different semantics?

I don't expect ConfigParser.get() to have the same semantics as dict.get(), and they're much more closely related than sets and dicts. I don't understand why so many people apparently have a problem with the name get(), but that's okay, I'm not wedded to the idea either. If folks prefer a different name, by all means suggest it. I like the name suggested by Wikipedia, "pick". As for being self-descriptive, I don't know, I haven't tried the experiment.

...

...
To clarify point 3, given:

x = set.get() y = set.get()

then x and y will only be the same element if set has length one.

So, it can't even be used for looping through a set because there is no termination?

That's a feature, not a bug! This is not intended to be a replacement for standard set iteration. Anyone writing the following: for _ in len(s): process(s.get()) instead of for x in s: process(x) will be taken out and slapped with a large fish. My reasoning for the given behaviour is as follows: * If you want the same element twice from a set, the One Obvious Way is to get an element from the set (somehow!) and assign it to a local variable: x = s.get() process(x) # later... process(x) So having get() return the same value each time is not useful or necessary. * If you want a random element on each call, the One Obvious Way is to call random.choice(list(s)). * Since sets aren't mappings, or sequences, there's no sensible way to look up an index or key, so any variation of "get element #1" are out. * Which leaves returning the elements in some unspecified arbitrary order. The most obvious arbitrary order is to cycle through them, which conveniently is exactly what iter(set) does, but we don't need to guarantee that order.

...

...
I believe that the patch supplied by Willi Richart implemented these behaviours.

http://bugs.python.org/issue7212

So you want to introduce additional, hidden state to sets? (to make sure that successive invocations return different values)

If you can think of any other way to efficiently cycle over the elements in a set, I'm all for it :) Presumably this is no different from what dict views do, except they don't wrap around when exhausted.

...

Do you want a thread local version too? (so that two threads can call gets without stomping on each other's guarantees that successive calls will produce distinct elements)

I think that's overkill. I wouldn't think we need to guarantee that two threads don't see the same element, or that each thread will see each element in the same order. We need only the much weaker guarantee that two subsequent calls to get() in the one thread with no modification to the set between the calls won't retrieve the same element each time (unless there is only one element to retrieve), and that each element will eventually be seen.

...

Do you have any real-world use-cases where next(), for-loops, or itertools wouldn't suffice?

Someone in this thread -- forgive me, I don't recall who -- had the case where they had a frozenset which they knew only had one element, but no obvious way to retrieve that element. To my mind, the major rationalisation for set.get() is not that there's no way to get a single element from a set, but that there is no obvious, easily discoverable way to retrieve a single element.

...

Is there a precedent in *any* other language you've ever seen? (setl has an "arb" function but it makes no promises about returning different values on consequetive calls; otherwise, I've never seen an equivalent in any other set implementation).

I can't say I've seen one in any other languages, but Wikipedia lists "pick" as a fundamental set operation: pick(S): returns an arbitrary element of S. http://en.wikipedia.org/wiki/Set_(computer_science) This page claims that Icon has an operator that returns a random element of a set: ? set( [1, 2, 3, 4, 5] ) http://stackoverflow.com/questions/124671/picking-a-random-element-from-a-se... but I've never used Icon myself and so can't confirm that.

...

Do you think the return-different-values-on-successive-calls semantics is self-evident and non-magical as compared to a straight for-loop or next(it)?

I'm going to sit on the fence on that and say "maybe".

...

ISTM, that when streams have non-destructive getters with self-advancing pointers, they also have a seek() function so that it can be controlled. Will this proposal need a seek() method too?

No. Since sets are unordered, there's no way to seek to a specific element.

...

Sorry for so many questions, but I honestly think there are too many unresolved design issues. We've seen no real-world source code that would be improved fwith the proposal. I think it sounds conceptually tempting and is fun to theorize about, but it actual implementation it will make sets more difficult to learn and it would quickly become a piece of rarely used, poorly understood cruft.

-- Steven D'Aprano

Antoine Pitrou

3:35 p.m.

Steven D'Aprano <steve <at> pearwood.info> writes:

...

If you can think of any other way to efficiently cycle over the elements in a set, I'm all for it :)

How about "for x in s"? Or if you want to cycle:

...

...
...
s = set('abc') it = itertools.cycle(s) next(it) 'a' next(it) 'c' next(it) 'b' next(it) 'a'

Or if you don't want the overhead of itertools.cycle() keeping a copy of the set's elements:

...

...
...
s = set('abc') it = itertools.chain.from_iterable(itertools.cycle([s])) next(it) 'a' next(it) 'c' next(it) 'b' next(it) 'a' next(it) 'c' next(it) 'b'

...

I can't say I've seen one in any other languages, but Wikipedia lists "pick" as a fundamental set operation:

pick(S): returns an arbitrary element of S.

Well, it's an arbitrary element. It isn't specified that it will try to return different results in a row to satisfy the developer's aesthetical preferences...

...

This page claims that Icon has an operator that returns a random element of a set:

? set( [1, 2, 3, 4, 5] )

random != arbitrary != weak-guaranteedly distinct

Nick Coghlan

3:46 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

Steven D'Aprano wrote:

...

...
So you want to introduce additional, hidden state to sets? (to make sure that successive invocations return different values)

If you can think of any other way to efficiently cycle over the elements in a set, I'm all for it :)

for x in itertools.cycle(s): # this is an infinite loop Having a pick() or get() method that returns an arbitrary member of a set makes sense to me. Having any state on the set that guarantees successive calls to get will return different values feels wrong - creating an object with that extra state is what iter(s) is for. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Stephen J. Turnbull

2:42 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

Steven D'Aprano writes:

...

The usual technique people tend to come up with is:

x = s.pop() s.add(x)

Which strikes many people (including myself) as inelegant. Surely "get an element" is a more fundamental operation than "get an element and remove it"?

Not in a literal urn or bag. See "sampling with replacement" in any statistics text. Note that in order to "have your cake and eat it too" there's an implicit copy operation (although in Python it will be a cheap ref copy rather than an expensive object copy). I'm afraid that the various conflicting intuitions here are all correct. That is going to make design impossible without some compelling use cases.

...

...
What's wrong with using next()? That is what it's for.

You can't call next() on a set. You have to call next(iter(set)). [...] Unless I missed something, this is so unobvious that nobody has suggested it in the thread yet.

I've seen it at least twice (Nick suggested it IIRC), although that may have been on Python-Ideas (which is where this thread belongs IMHO).

...

If folks prefer a different name, by all means suggest it. I like the name suggested by Wikipedia, "pick".

"Pick" has a connotation of removal in many contexts: "Pick a card, any card". "Pick it up off the floor" (it's not in the set of things on the floor any more). Like get, to me it has a flavor of "according to some criterion": "a band of picked men". I would expect a pick or get operation to take an optional predicate. But then TOOWTDI is for x in container: if predicate(x): break else: x = default_x # or perhaps raise in cases where there # theoretically should be an element

...

My reasoning for the given behaviour is as follows:

* If you want the same element twice from a set,

Nobody is asking for that AFAICS. The use case I have so far observed people to have in mind is a cast from an equivalence class to a representative member. They don't care whether the member is the same or not. If they wanted iterator behavior, getting it would not be a problem. next(iter(foo_set)) is TOOWTDI. If they wanted cyclical behavior, itertools.cycle. If they wanted random behavior, make a list and choose from it. In one leading subcase, the equivalence class is a singleton. In this use case what people really want, I suspect, is behavior like Python strings and characters: a singleton set should provide the same operations as its unique element and vice versa.

...

So having get() return the same value each time is not useful or necessary.

Which is not the point. The point is that having get return a different value if possible is not useful or necessary in at least some of the leading use cases. OTOH, we don't know enough about "weird" use cases where having a different value returned is important to decide how that should be done, where "weird" means "outside of the list of already Obvious Ways".

...

No. Since sets are unordered, there's no way to seek to a specific element.

I think people will realize that in fact *these* sets are ordered and they will demand a seek function for various practical purposes.

...

...
Sorry for so many questions, but I honestly think there are too many unresolved design issues.

I have to agree with Raymond.

Steven D'Aprano

November 2009

11:21 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

It seems that even those originally asking for set retrieval have gone silent, so I guess this isn't going anywhere. However, for the benefit of future discussions (because I'm sure this question will be raised again), I'd like to answer a couple of points raised by Stephen. On Sat, 31 Oct 2009 01:42:46 pm Stephen J. Turnbull wrote:

...

> If folks prefer a different name, by all means suggest it. I like > the name suggested by Wikipedia, "pick".

"Pick" has a connotation of removal in many contexts: "Pick a card, any card".

And in just as many, there is no connotation of removal. "Pick a colour". For what it's worth, Forth and similar stack-based languages usually have a function "pick" equivalent to pop-without-removal. pick seems to be a standard term for this behaviour.

...

Like get, to me it has a flavor of "according to some criterion": "a band of picked men". I would expect a pick or get operation to take an optional predicate.

I think you are underestimating the power of context here. In practice, method names are interpreted in the context of the data being operated on. We don't get confused that ConfigParser.get() has a different signature to dict.get(), which is different again from Queue.get(). list.pop() takes an index; dict.pop() takes a key and an optional second argument; set.pop() takes no argument at all. Is anyone actually confused by this? If not, why would set.get() be more confusing? I think far too many people say "this is confusing" when what they really mean is "I don't like this".

...

The use case I have so far observed people to have in mind is a cast from an equivalence class to a representative member.

The difficulty is that we actually have two related, but different, behaviours, and sadly we've conflated the two of them by using the same name "get" for both. One behaviour is what Wikipedia calls pick: set.pick() -> return an arbitrary element from set without removing it This is not useful for the cast you describe. For that, you need a method that takes an argument. I'm going to call it "retrieve", to avoid the baggage of "get": set.retrieve(x) -> return the element from set equal to x In the first case, it seems self-evident to me that having "arbitrary" mean "the same element each time it is called" really would condemn pick() to be unused, but I don't care enough to argue. In the second case, the retrieval isn't arbitrary, so this is not a problem that needs solving.

...

> No. Since sets are unordered, there's no way to seek to a specific > element.

I think people will realize that in fact *these* sets are ordered and they will demand a seek function for various practical purposes.

We can iterate over dicts, and the order is arbitrary but consistent. Where are the people clamouring for dict.seek()? -- Steven D'Aprano

Willi Richert

1:55 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

Am Sonntag, 1. November 2009 12:21:15 schrieben Sie:

...

It seems that even those originally asking for set retrieval have gone silent

Nope. Stilll following and waiting for the verdict of the community after having filed the corpus delicti [1] wr [1]: http://bugs.python.org/issue7212

Willi Richert

October 2009

8:16 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Am Freitag, 30. Oktober 2009 03:58:16 schrieb Steven D'Aprano:

...

To clarify point 3, given:

x = set.get() y = set.get()

then x and y will only be the same element if set has length one. However, given:

x = set.get() set.add(el) set.remove(el) y = set.get()

there are no guarantees about x and y being different.

I believe that the patch supplied by Willi Richart implemented these behaviours.

http://bugs.python.org/issue7212

Actually, no. The patch makes no assumption about x and y being different. It does not try to extend the core functionality of set nor does it need to maintain any state that would be necessary for that. It is just a more obvious and cleaner way for saying "for x in some_set: break". So, as Raymond says, it is the Pythonic version of "arb" in setl. wr

Georg Brandl

8:37 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Steven D'Aprano schrieb:

...

On Wed, 28 Oct 2009 04:47:59 am Raymond Hettinger wrote:

...
A dict.get() can be meaningfully used in a loop (because the key can vary). A set.get() returns the same value over and over again (because there is no key).

I don't believe anyone has requested those semantics. The suggested semantics for set.get() with no arguments, as I understand them, are:

(1) it will only fail if the set is empty;

(2) it should be efficient;

(3) if you call it repeatedly on a set without modifying the set, you will cycle through each element in turn in some unspecified arbitrary order.

I don't like this. It gives a set object a hidden state, something that AFAICS no other builtin has. Iterating over an iterable is what iterators are for. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

A.M. Kuchling

8:27 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Fri, Oct 30, 2009 at 09:37:36PM +0100, Georg Brandl wrote:

...

I don't like this. It gives a set object a hidden state, something that AFAICS no other builtin has. Iterating over an iterable is what iterators are for.

It also makes the object thread-unsafe; there's no way for two threads to iterate over it at the same time. It's a terrible idea to introduce new things that won't work under threaded usage. --amk

Steven D'Aprano

1:29 a.m.

New subject: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

On Sat, 31 Oct 2009 07:27:22 am A.M. Kuchling wrote:

...

On Fri, Oct 30, 2009 at 09:37:36PM +0100, Georg Brandl wrote:

...
I don't like this. It gives a set object a hidden state, something that AFAICS no other builtin has.

All objects have a reference count field which is hidden from Python code. The C API for objects has a flags field which specifies whether objects are read-only or read/write from Python code. As of Python 2.6, type objects have an internal method cache. C code can clear it with PyType_ClearCache(), Python codes can't even see it. Lists and dicts pre-allocate extra space, and record hidden state of how much of the space is actually in use. Sets may do the same. File objects may use internal buffers, with all the state that implies.

...

...
Iterating over an iterable is what iterators are for.

set.get(), or set.pick() as Wikipedia calls it, isn't for iterating over sets. It is for getting an arbitrary element from the set. If the requirement that get/pick() cycles through the sets elements is the killer objection to this proposal, I'd be willing to drop it. I thought that was part of the OP's request, but apparently it isn't. I would argue that it's hardly "arbitrary" if you get the same element every time you call the method, but if others are happy with that behaviour, I'm not going to stand in the way.

...

It also makes the object thread-unsafe; there's no way for two threads to iterate over it at the same time. It's a terrible idea to introduce new things that won't work under threaded usage.

I would agree if the purpose of get/pick() was to iterate over the elements of the set, but that's not the purpose. The purpose is to return an arbitrary item each time it is called. If two threads get the same element, that's not a problem; if one thread misses an element because another thread grabbed it first, that's not a problem either. If people prefer a random element instead, I have no problem with that -- personally I think that's overkill, but maybe that's just me. -- Steven D'Aprano

Benjamin Peterson

1:38 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

2009/10/30 Steven D'Aprano <steve@pearwood.info>:

...

On Sat, 31 Oct 2009 07:27:22 am A.M. Kuchling wrote:

...
On Fri, Oct 30, 2009 at 09:37:36PM +0100, Georg Brandl wrote:

...
I don't like this. It gives a set object a hidden state, something that AFAICS no other builtin has.

All objects have a reference count field which is hidden from Python code. The C API for objects has a flags field which specifies whether objects are read-only or read/write from Python code.

As of Python 2.6, type objects have an internal method cache. C code can clear it with PyType_ClearCache(), Python codes can't even see it.

Lists and dicts pre-allocate extra space, and record hidden state of how much of the space is actually in use. Sets may do the same. File objects may use internal buffers, with all the state that implies.

Those are all examples of states which are implementation details. The proposed get() semantics would require that allow implementations keep this state around. -- Regards, Benjamin

Chris Bergstresser

1:43 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Fri, Oct 30, 2009 at 8:29 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

...
...
Iterating over an iterable is what iterators are for.

set.get(), or set.pick() as Wikipedia calls it, isn't for iterating over sets. It is for getting an arbitrary element from the set.

If the requirement that get/pick() cycles through the sets elements is the killer objection to this proposal, I'd be willing to drop it. I thought that was part of the OP's request, but apparently it isn't. I would argue that it's hardly "arbitrary" if you get the same element every time you call the method, but if others are happy with that behaviour, I'm not going to stand in the way.

It's arbitrary in the sense that the API doesn't make any assurances which item the caller will get, not that it's arbitrary for any particular * implementation*.

...

The purpose is to return an arbitrary item each time it is called. If two threads get the same element, that's not a problem; if one thread misses an element because another thread grabbed it first, that's not a problem either. If people prefer a random element instead, I have no problem with that -- personally I think that's overkill, but maybe that's just me.

I also think returning a random one is overkill, in the base set. And I'd have the base implementation do the simplest thing possible: return a fixed element (either the first returned when iterated over, or the last stored, or whatever's easiest based on the internals). For me, leaving the choice of *which* element to return on each call is a feature. It allows subclasses to change the behavior to support other use cases, like a random or round-robin behavior. -- Chris

Cameron Simpson

November 2009

5:29 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On 30Oct2009 20:43, Chris Bergstresser <chris@subtlety.com> wrote: | On Fri, Oct 30, 2009 at 8:29 PM, Steven D'Aprano <steve@pearwood.info> wrote: | >> > Iterating over an iterable is | >> > what iterators are for. | > | > set.get(), or set.pick() as Wikipedia calls it, isn't for iterating over | > sets. It is for getting an arbitrary element from the set. [...] | > The purpose is to | > return an arbitrary item each time it is called. If two threads get the | > same element, that's not a problem; if one thread misses an element | > because another thread grabbed it first, that's not a problem either. | > If people prefer a random element instead, I have no problem with | > that -- personally I think that's overkill, but maybe that's just me. | | I also think returning a random one is overkill, in the base set. | And I'd have the base implementation do the simplest thing possible: | return a fixed element (either the first returned when iterated over, | or the last stored, or whatever's easiest based on the internals). | For me, leaving the choice of *which* element to return on each call | is a feature. It allows subclasses to change the behavior to support | other use cases, like a random or round-robin behavior. Personally, I'm for the iteration spec in a lot of ways. Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it? Secondly, I think the thread-unsafe arguments are weak. Question: is the existing set iteration scheme thread safe? i.e. does it return a fresh iterator that a thread can happily use concurrently while another thread uses its own iterator? (Absent set modifications). If the answer is yes, then I don't buy the thread-unsafe complaints - there are already plenty of thread unsafe things much as lots of iterators will break in the face of changes to the data structures over which they're iterating. If iter(set) gets you a safe iterator (absent set modification and multiple users of that iterator) then thread safe methods exist and are easy to use. No presently thread-safe program is going to be broken by adding an iterating .get() method. Thirdly, an iteration-like .get() is dead easy to provide: you just keep a _single_, cycling, internal iterator made on demand the same way __iter__ already works. So why not do just do it? There's no need to construct it before anyone calls .get() the first time. Somewhat like: def get(self): for x in self: return x raise something here but not making a new iterator for every caller. Indeed, making a new iterater per caller, in addition to being expensive, might easily return the same element to every caller. Do anyone feel an iteration capable .get() unduely burdens subclasses that want to impement different .get()s? Both the suggested potential subclass choices (round robin and random) suggest iteration capable .get()s (presuming "random" means "shuffle order" rather than potentially returning the same element twice). Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Why doesn't DOS ever say EXCELLENT command or filename?

Antoine Pitrou

10:21 a.m.

New subject: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

Cameron Simpson <cs <at> zip.com.au> writes:

...

Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

I do. Since the caller is asking for an arbitrary element, returning the same element is legitimate. It's funny how people seem to have a problem with the word "arbitrary" :) And I'm -1 on any implicit iteration attaching some state to the object. If you want to iterate, there's already an obvious way to it. Regards Antoine.

Nick Coghlan

12:27 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Antoine Pitrou wrote:

...

Cameron Simpson <cs <at> zip.com.au> writes:

...
Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

I do. Since the caller is asking for an arbitrary element, returning the same element is legitimate. It's funny how people seem to have a problem with the word "arbitrary" :)

Agreed. Arbitrary is arbitrary - for a stateless method that returns an arbitrary result, return the same value every time is fine.

...

And I'm -1 on any implicit iteration attaching some state to the object. If you want to iterate, there's already an obvious way to it.

Also agreed. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Cameron Simpson

7:30 a.m.

New subject: Python-Dev Retrieve an arbitrary element from a set?withoutremoving it

On 02Nov2009 10:21, Antoine Pitrou <solipsis@pitrou.net> wrote: | Cameron Simpson <cs <at> zip.com.au> writes: | > | > Personally, I'm for the iteration spec in a lot of ways. | > | > Firstly, a .get()/.pick() that always returns the same element feels | > horrible. Is there anyone here who _likes_ it? | | I do. Since the caller is asking for an arbitrary element, returning the same | element is legitimate. It's funny how people seem to have a problem with the | word "arbitrary" :) | | And I'm -1 on any implicit iteration attaching some state to the object. If you | want to iterate, there's already an obvious way to it. Good point. Colour me convinced by this. I suggest the doco is really clear about the word arbitrary:-) -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ I must really get a thinner pencil. I can't manage this one a bit. It writes all manner of things I don't intend. - rheney@csugrad.cs.vt.edu (Bolo Mk XXXIX)

Willi Richert

11 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Hi, all your points are valid -- for the experienced Python programmer who has stumbled across this issue already and solved it in one of several ways. All your points, however, do not support the "one obvious way to do it" philosophy of Python. It's all about making Python even more clean and beautiful. wr Am Montag, 2. November 2009 06:29:00 schrieb Cameron Simpson:

...

On 30Oct2009 20:43, Chris Bergstresser <chris@subtlety.com> wrote: | On Fri, Oct 30, 2009 at 8:29 PM, Steven D'Aprano <steve@pearwood.info> wrote: | >> > Iterating over an iterable is | >> > what iterators are for. | > | > set.get(), or set.pick() as Wikipedia calls it, isn't for iterating | > over sets. It is for getting an arbitrary element from the set.

[...]

| > The purpose is to | > return an arbitrary item each time it is called. If two threads get the | > same element, that's not a problem; if one thread misses an element | > because another thread grabbed it first, that's not a problem either. | > If people prefer a random element instead, I have no problem with | > that -- personally I think that's overkill, but maybe that's just me. | | I also think returning a random one is overkill, in the base set. | And I'd have the base implementation do the simplest thing possible: | return a fixed element (either the first returned when iterated over, | or the last stored, or whatever's easiest based on the internals). | For me, leaving the choice of *which* element to return on each call | is a feature. It allows subclasses to change the behavior to support | other use cases, like a random or round-robin behavior.

Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

Secondly, I think the thread-unsafe arguments are weak. Question: is the existing set iteration scheme thread safe? i.e. does it return a fresh iterator that a thread can happily use concurrently while another thread uses its own iterator? (Absent set modifications). If the answer is yes, then I don't buy the thread-unsafe complaints - there are already plenty of thread unsafe things much as lots of iterators will break in the face of changes to the data structures over which they're iterating. If iter(set) gets you a safe iterator (absent set modification and multiple users of that iterator) then thread safe methods exist and are easy to use. No presently thread-safe program is going to be broken by adding an iterating .get() method.

Thirdly, an iteration-like .get() is dead easy to provide: you just keep a _single_, cycling, internal iterator made on demand the same way __iter__ already works. So why not do just do it? There's no need to construct it before anyone calls .get() the first time. Somewhat like:

def get(self): for x in self: return x raise something here

but not making a new iterator for every caller. Indeed, making a new iterater per caller, in addition to being expensive, might easily return the same element to every caller.

Do anyone feel an iteration capable .get() unduely burdens subclasses that want to impement different .get()s? Both the suggested potential subclass choices (round robin and random) suggest iteration capable .get()s (presuming "random" means "shuffle order" rather than potentially returning the same element twice).

Cheers,

Greg Ewing

10:19 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Cameron Simpson wrote:

...

Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

It doesn't sound very useful to me. However, an iterating version of it doesn't sound any more useful. What would it gain you? Why not just iterate over the set? Or make a copy and repeatedly pop() it? I completely fail to see a use case for this. -- Greg

Yuvgoog Greenle

8:20 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Nov 3, 2009 at 12:19 AM, Greg Ewing <greg.ewing@canterbury.ac.nz>wrote:

...

Cameron Simpson wrote:

Personally, I'm for the iteration spec in a lot of ways.

...
Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

State might cause people to use this to iterate which would be just plain wrong. The 2 things I have a bad feeling about is: 1. random.choice could be a pythonic alternative to what some have mentioned here but it doesn't work on sets. The following code raises TypeError: 'set' object does not support indexing: import random random.choice(set([1,2,3,4,5,6])) this is kinda ironic: http://en.wikipedia.org/wiki/Axiom_of_choice 2. If I store objects in a set and modify their attributes I have no O(1) way of getting the objects back if I stumble upon an equivalent object. In cases like that I'd have to either iterate over the set or use a dict with key == value. I have a feeling the "get" or "peek" api could cater to this need. A use case for this could be an implementation of a cookie jar with a set of cookies where equivalence is defined by the name and domain. --yuv

Guido van Rossum

6:56 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Nov 3, 2009 at 12:20 AM, Yuvgoog Greenle <ubershmekel@gmail.com> wrote:

...

On Tue, Nov 3, 2009 at 12:19 AM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...
Cameron Simpson wrote:

...
Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

State might cause people to use this to iterate which would be just plain wrong. The 2 things I have a bad feeling about is: 1. random.choice could be a pythonic alternative to what some have mentioned here but it doesn't work on sets. The following code raises TypeError: 'set' object does not support indexing: import random random.choice(set([1,2,3,4,5,6]))

There is a huge difference between picking a random element of a set and picking an arbitrary one. Algorithms that need randomness *must* use the random generator (and because the hash table implementation doesn't provide O(1) access they will have to use a list first). Algorithms that don't need randomness should not be forced to pay for the (considerable!) cost of calling the random number generator, and must accept that the element selected may be predictable.

...

this is kinda ironic: http://en.wikipedia.org/wiki/Axiom_of_choice

Also irrelevant. That Axiom is only interesting for infinite sets, which Python does not support (at least not using the set builtin -- you can of course write your own symbolic algebra package in Python that can be used to represent certain infinite sets, though you still won't be able to iterate over all of their elements :-).

...

2. If I store objects in a set and modify their attributes I have no O(1) way of getting the objects back if I stumble upon an equivalent object. In cases like that I'd have to either iterate over the set or use a dict with key == value. I have a feeling the "get" or "peek" api could cater to this need. A use case for this could be an implementation of a cookie jar with a set of cookies where equivalence is defined by the name and domain.

Sets are what they are. If they don't fit your requirements, don't use them. Don't get fooled by the name -- a dict also isn't a very good data structure to implement an actual Chinese-English dictionary, for example. :-) -- --Guido van Rossum (python.org/~guido)

Steven D'Aprano

10:46 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, 3 Nov 2009 09:19:38 am Greg Ewing wrote:

...

Cameron Simpson wrote:

...
Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

It doesn't sound very useful to me. However, an iterating version of it doesn't sound any more useful. What would it gain you? Why not just iterate over the set? Or make a copy and repeatedly pop() it?

Since I was the person who decided that "arbitrary" meant "give a different result each time", I should answer that. My intention was not that people should use repeated calls to pick() for iteration. I described that as an abuse of the method. But I can imagine scenarios where the caller calls set.pick() sequentially: def pick_two_cards(hand): assert isinstance(hand, (set, frozenset)) assert len(hand) == 5 return (hand.pick(), hand.pick()) As I said in a reply to Raymond, optimization was not my primary concern, but as a fundamental set operation[1] pick() should be efficient. It may be called on a set with millions of items, not just small sets. Copying to another set, or to a list, will be O(N) and potentially slow -- at the very least, it is wasteful to copy millions of elements in order to access one. I don't know how expensive it is to create a set iterator, but (my reasoning went) surely we can do better? The set already has access to its own elements, it just needs to return one of them. pop() also knows how to retrieve an arbitrary element. pick() is just like pop(), except without removal.

...

I completely fail to see a use case for this.

Nevertheless, people keep requesting it, so obviously they have a use for it. None of the standard solutions are obvious or easily discoverable, and in my experience the usual solution people come up with is to pop() an element, then add() it back in, but of course that's not just inelegant but it fails on frozensets. [1] Obviously there is disagreement on whether or not pick() is a fundamental operation or not. As Raymond points out, it is uncommon in other languages. But Wikipedia lists it as fundamental, and it is not just Python users who ask for it: http://stackoverflow.com/questions/124671/picking-a-random-element-from-a-se... -- Steven D'Aprano

Guido van Rossum

11:10 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Nov 3, 2009 at 2:46 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Tue, 3 Nov 2009 09:19:38 am Greg Ewing wrote:

...
Cameron Simpson wrote:

...
Personally, I'm for the iteration spec in a lot of ways.

Firstly, a .get()/.pick() that always returns the same element feels horrible. Is there anyone here who _likes_ it?

It doesn't sound very useful to me. However, an iterating version of it doesn't sound any more useful. What would it gain you? Why not just iterate over the set? Or make a copy and repeatedly pop() it?

Since I was the person who decided that "arbitrary" meant "give a different result each time", I should answer that.

My intention was not that people should use repeated calls to pick() for iteration. I described that as an abuse of the method. But I can imagine scenarios where the caller calls set.pick() sequentially:

def pick_two_cards(hand): assert isinstance(hand, (set, frozenset)) assert len(hand) == 5 return (hand.pick(), hand.pick())

As I said in a reply to Raymond, optimization was not my primary concern, but as a fundamental set operation[1] pick() should be efficient. It may be called on a set with millions of items, not just small sets. Copying to another set, or to a list, will be O(N) and potentially slow -- at the very least, it is wasteful to copy millions of elements in order to access one.

I don't know how expensive it is to create a set iterator, but (my reasoning went) surely we can do better? The set already has access to its own elements, it just needs to return one of them. pop() also knows how to retrieve an arbitrary element. pick() is just like pop(), except without removal.

...
I completely fail to see a use case for this.

Nevertheless, people keep requesting it, so obviously they have a use for it. None of the standard solutions are obvious or easily discoverable, and in my experience the usual solution people come up with is to pop() an element, then add() it back in, but of course that's not just inelegant but it fails on frozensets.

[1] Obviously there is disagreement on whether or not pick() is a fundamental operation or not. As Raymond points out, it is uncommon in other languages. But Wikipedia lists it as fundamental, and it is not just Python users who ask for it:

http://stackoverflow.com/questions/124671/picking-a-random-element-from-a-se...

You're obviously talking about a *random* element. This is a separate use case (though I agree many people don't know the difference). Picking a random element can be done in O(1) only if the data structure supports access by index, which Python's hash tables don't. -- --Guido van Rossum (python.org/~guido)

Antoine Pitrou

11:21 p.m.

New subject: [Python-Dev] Retrieve an arbitrary element from a set withoutremoving it

Guido van Rossum <guido <at> python.org> writes:

...

You're obviously talking about a *random* element. This is a separate use case (though I agree many people don't know the difference).

Picking a random element can be done in O(1) only if the data structure supports access by index, which Python's hash tables don't.

Well, at the implementation level, they can. You'd just have to pick a new random index until it points to a non-empty slot. Regards Antoine.

Greg Ewing

1 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Antoine Pitrou wrote:

...

Guido van Rossum <guido <at> python.org> writes:

...
Picking a random element can be done in O(1) only if the data structure supports access by index, which Python's hash tables don't.

Well, at the implementation level, they can. You'd just have to pick a new random index until it points to a non-empty slot.

But that's hardly O(1). -- Greg

Antoine Pitrou

1:10 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:

...

...
...
Picking a random element can be done in O(1) only if the data structure supports access by index, which Python's hash tables don't.

Well, at the implementation level, they can. You'd just have to pick a new random index until it points to a non-empty slot.

But that's hardly O(1).

It is, assuming that the set has a built-in minimum fill level (e.g. it always keeps at least x% of its entries filled), and the random generator is good. (of course, it is "statistically O(1)", like lookups in a hash table actually) Regards Antoine.

Guido van Rossum

2:11 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Nov 3, 2009 at 5:10 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:

...

Greg Ewing <greg.ewing <at> canterbury.ac.nz> writes:

...
...
...
Picking a random element can be done in O(1) only if the data structure supports access by index, which Python's hash tables don't.

Well, at the implementation level, they can. You'd just have to pick a new random index until it points to a non-empty slot.

But that's hardly O(1).

It is, assuming that the set has a built-in minimum fill level (e.g. it always keeps at least x% of its entries filled), and the random generator is good.

(of course, it is "statistically O(1)", like lookups in a hash table actually)

Clever. But I don't think the set implementation should have a dependency on random(), so it would have to expose an "indexing" operation, which smells like it would expose too much of the implementation for comfort. -- --Guido van Rossum (python.org/~guido)

Steven D'Aprano

11 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Wed, 4 Nov 2009 10:10:30 am Guido van Rossum wrote:

...

On Tue, Nov 3, 2009 at 2:46 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...
Since I was the person who decided that "arbitrary" meant "give a different result each time", I should answer that.

...

You're obviously talking about a *random* element. This is a separate use case (though I agree many people don't know the difference).

I'm aware of the difference between random and arbitrary, and in an earlier post I said that the One Obvious Way of getting a random element from a list would be to convert to a list and call random.choice(). Sorry for muddying the waters by linking to a page discussing such random selections. -- Steven D'Aprano

Daniel Stutzbach

11:11 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Tue, Nov 3, 2009 at 4:46 PM, Steven D'Aprano <steve@pearwood.info> wrote:

...

def pick_two_cards(hand): assert isinstance(hand, (set, frozenset)) assert len(hand) == 5 return (hand.pick(), hand.pick())

Even if pick() chose random, you still might end up picking the same card twice. Is that really what you intended? FWIW, I've been working on an extension module that supplies a "sortedset" type [1]. In most ways, it's similar to a set except it's indexable like a list. The items are kept in sorted order, so index 0 is always the lowest item, index 1 is the next-to-lowest, etc. Because they're indexable, it's easy and efficient to retrieve random elements using the standard library's "random" module. With the sortedset type, that function would become: def pick_two_cards(hand): assert isinstance(hand, (set, frozenset)) assert len(hand) == 5 return random.choice(hand), random.choice(hand) or if you want to avoid duplicates: return random.sample(hand, 2) Would something like that fit your needs? [1] It's already implemented, along with sortedlist, weaksortedlist, and weaksortedset types. I'm just putting them through the paces in my own products before releasing them. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Greg Ewing

12:54 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

Steven D'Aprano wrote:

...

I don't know how expensive it is to create a set iterator,

Not expensive enough to justify burdening the set type with extra functionality that will be extremely rarely used. -- Greg

Steven D'Aprano

11:34 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Wed, 4 Nov 2009 11:54:47 am Greg Ewing wrote:

...

Steven D'Aprano wrote:

...
I don't know how expensive it is to create a set iterator,

Not expensive enough to justify burdening the set type with extra functionality that will be extremely rarely used.

As my previous posts on this topic tried to convey, this isn't primarily about efficiency, but about discoverability and obviousness. Anyway, given the level of opposition to the suggestion, I'm no longer willing to carry the flag for it. If anyone else -- perhaps the OP -- feels they want to take it any further, be my guest. -- Steven D'Aprano

geremy condra

9:32 p.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On Wed, Nov 4, 2009 at 6:34 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, 4 Nov 2009 11:54:47 am Greg Ewing wrote:

...
Steven D'Aprano wrote:

...
I don't know how expensive it is to create a set iterator,

Not expensive enough to justify burdening the set type with extra functionality that will be extremely rarely used.

As my previous posts on this topic tried to convey, this isn't primarily about efficiency, but about discoverability and obviousness.

Anyway, given the level of opposition to the suggestion, I'm no longer willing to carry the flag for it. If anyone else -- perhaps the OP -- feels they want to take it any further, be my guest.

-- Steven D'Aprano

I've said before that I'd like there to be one, standard way of doing this. A function call- set.pick() seems reasonably named to me- is probably the cleanest way to do that. Absent that, an example in the docs that illustrates the preferred idiom would be great. Is there any kind of consensus on either point? Geremy Condra

Raymond Hettinger

1:07 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

[Steven D'Aprano]

...

...
Anyway, given the level of opposition to the suggestion, I'm no longer willing to carry the flag for it. If anyone else -- perhaps the OP -- feels they want to take it any further, be my guest.

[geremy condra]

...

I've said before that I'd like there to be one, standard way of doing this. A function call- set.pick() seems reasonably named to me- is probably the cleanest way to do that. Absent that, an example in the docs that illustrates the preferred idiom would be great.

Summarizing my opposition to a new set method: 1) there already are at least two succinct ways to get the same effect 2) those ways work with any container, not just sets 3) set implementations in other languages show that this isn't needed. 4) there is value to keeping the API compact 5) isn't needed for optimization (selecting the same value in a loop makes no sense) 6) absence of real-world code examples that would be meaningfully improved I would be happy to add an example to the docs so that this thread can finally end. Raymond

Eric Smith

1:12 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

Raymond Hettinger wrote:

...

Summarizing my opposition to a new set method: 1) there already are at least two succinct ways to get the same effect 2) those ways work with any container, not just sets 3) set implementations in other languages show that this isn't needed. 4) there is value to keeping the API compact 5) isn't needed for optimization (selecting the same value in a loop makes no sense) 6) absence of real-world code examples that would be meaningfully improved

I would be happy to add an example to the docs so that this thread can finally end.

Please do! Eric.

Terry Reedy

1:48 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

Eric Smith wrote:

...

Raymond Hettinger wrote:

...
Summarizing my opposition to a new set method: 1) there already are at least two succinct ways to get the same effect 2) those ways work with any container, not just sets 3) set implementations in other languages show that this isn't needed. 4) there is value to keeping the API compact 5) isn't needed for optimization (selecting the same value in a loop makes no sense) 6) absence of real-world code examples that would be meaningfully improved

Agreed

...

...
I would be happy to add an example to the docs so that this thread can finally end.

Please do!

Yes!

Chris Bergstresser

8:43 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Wed, Nov 4, 2009 at 7:07 PM, Raymond Hettinger <python@rcn.com> wrote:

...

[Steven D'Aprano]

...
...
Anyway, given the level of opposition to the suggestion, I'm no longer willing to carry the flag for it. If anyone else -- perhaps the OP -- feels they want to take it any further, be my guest.

I feel pretty strongly that it's a wart in the language, and a sufficiently strong one that it should be remedied. I'm happy to champion it, but haven't the faintest idea what that entails.

...

Summarizing my opposition to a new set method: 1) there already are at least two succinct ways to get the same effect 2) those ways work with any container, not just sets 3) set implementations in other languages show that this isn't needed. 4) there is value to keeping the API compact 5) isn't needed for optimization (selecting the same value in a loop makes no sense) 6) absence of real-world code examples that would be meaningfully improved

I would be happy to add an example to the docs so that this thread can finally end.

Adding an example to the docs does not solve the problem, which is if you come across the following code: for x in s: break ... it really looks like it does nothing. It's only because of the slightly idiosyncratic way Python handles variable scoping that it has an effect at all, and that effect isn't overtly related to what the code says, which is "Iterate over all the elements in this set, then immediately stop after the first one". s.get() or s.pick() are both more succinct and more clear, saying "Get me an arbitrary element from this set". To make matters worse, "for x in s: break" fails silently when s is empty, and "x = iter(s).next()" raises a StopIteration exception. Neither is clear. The obvious way, for newcomers, of achieving the effect is: x = s.pop() s.add(x) ... and that's simply horrible in terms of efficiency. So the "obvious" way of doing it in Python is wrong(TM), and the "correct" way of doing it is obscure and raises misleading exceptions. I suppose, mulling things over, the method should be called .pick(), which avoids any confusion with .get(). And, as I've stated, I think it should return a member of the set, with no guarantees what member of the set is returned. It could be the same one every time, or a random one, or the last one placed in the set. For cases where people want to cycle through the members of the set in a predictable order, they can either copy the contents into a list (sorted or unsorted) *or* subclass set and override the .pick() method to place stronger guarantees on the API. So, summarizing my responses: 1) the two succinct ways are unclear and not immediately obvious 2) the existing methods aren't needed for other objects 3) set implementations in other languages are irrelevant 4) this is a small, targeted change which not make the API disordered or unruly 5) could very well be needed for optimization, in cases where constructing an iterator is expensive 6) there have been several real-world examples posted which would be improved by this change -- Chris

Alexander Belopolsky

9:09 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser <chris@subtlety.com> wrote:

...

.. and "x = iter(s).next()" raises a StopIteration exception.

And that's why the documented recipe should probably recommend next(iter(s), default) instead. Especially because iter(s).next() is not even valid code in 3.0.

geremy condra

11:04 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

...

On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser <chris@subtlety.com> wrote:

...
.. and "x = iter(s).next()" raises a StopIteration exception.

And that's why the documented recipe should probably recommend next(iter(s), default) instead. Especially because iter(s).next() is not even valid code in 3.0.

This seems reasonably legible to you? Strikes me as coding by incantation. Also, while I've heard people say that the naive approach is slower, I'm not getting that result. Here's my test:

...

...
...
smrt = timeit.Timer("next(iter(s))", "s=set(range(100))") smrt.repeat(10) [1.2845709323883057, 0.60247397422790527, 0.59621405601501465, 0.59133195877075195, 0.58387589454650879, 0.56839084625244141, 0.56839680671691895, 0.56877803802490234, 0.56905913352966309, 0.56846404075622559]

...

...
...
naive = timeit.Timer("x=s.pop();s.add(x)", "s=set(range(100))") naive.repeat(10) [0.93139314651489258, 0.53566789627075195, 0.53674602508544922, 0.53608798980712891, 0.53634309768676758, 0.53557991981506348, 0.53578495979309082, 0.53666114807128906, 0.53576493263244629, 0.53491711616516113]

Perhaps my test is flawed in some way? Geremy Condra

James Y Knight

11:17 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Nov 5, 2009, at 6:04 PM, geremy condra wrote:

...

Perhaps my test is flawed in some way?

Yes: you're testing the speed of something that makes absolutely no sense to do in a tight loop, so *who the heck cares how fast any way of doing it is*! Is this thread over yet? James

Yuvgoog Greenle

11:52 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Fri, Nov 6, 2009 at 1:17 AM, James Y Knight <foom@fuhm.net> wrote:

...

Is this thread over yet?

Sorry, I just had to point out that pop/add has a side effect that would be apparent on a set that multiple threads access - it loses an item and then gets it back. Sounds like a sleeper race condition that's going to be rare but extremely hard to find if it does occur. Crooked as a gil. --yuv

Steven D'Aprano

6:01 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Fri, 6 Nov 2009 10:52:54 am Yuvgoog Greenle wrote:

...

On Fri, Nov 6, 2009 at 1:17 AM, James Y Knight <foom@fuhm.net> wrote:

...
Is this thread over yet?

Sorry, I just had to point out that pop/add has a side effect that would be apparent on a set that multiple threads access - it loses an item and then gets it back. Sounds like a sleeper race condition that's going to be rare but extremely hard to find if it does occur. Crooked as a gil.

Surely Raymond's suggestion also suffers from a similar race condition? for x in set: return x creates a set_iterator. If another thread modifies the original set after the set_iterator is created but before the return, you would get a mysterious and almost impossible to debug RuntimeError. -- Steven D'Aprano

geremy condra

12:30 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Thu, Nov 5, 2009 at 6:17 PM, James Y Knight <foom@fuhm.net> wrote:

...

On Nov 5, 2009, at 6:04 PM, geremy condra wrote:

...
Perhaps my test is flawed in some way?

Yes: you're testing the speed of something that makes absolutely no sense to do in a tight loop, so *who the heck cares how fast any way of doing it is*!

Is this thread over yet?

James

I'm testing the speed because the claim was made that the pop/add approach was inefficient. Here's the full quote:

...

The obvious way, for newcomers, of achieving the effect is:

x = s.pop() s.add(x)

... and that's simply horrible in terms of efficiency. So the "obvious" way of doing it in Python is wrong(TM), and the "correct" way of doing it is obscure and raises misleading exceptions.

Since I use this in a graph theory library that I am currently trying to scale to handle 300 million node simulations, and this is used in several relevant algorithms, I was concerned. Geremy Condra

Chris Bergstresser

3:12 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Thu, Nov 5, 2009 at 6:30 PM, geremy condra <debatem1@gmail.com> wrote:

...

I'm testing the speed because the claim was made that the pop/add approach was inefficient. Here's the full quote:

...
The obvious way, for newcomers, of achieving the effect is:

x = s.pop() s.add(x)

... and that's simply horrible in terms of efficiency. So the "obvious" way of doing it in Python is wrong(TM), and the "correct" way of doing it is obscure and raises misleading exceptions.

I was talking mainly from a theoretical standpoint, and because the library I'm working on is designed to work seamlessly over the network. In those cases, where the set the user is working with is actually a proxy object across the wire, the time to acquire the locks, remove the object, release the locks, reacquire the locks, add the object, then rerelease the locks is *significantly* more expensive than just noting the set hasn't changed and returning a cached object from it. -- Chris

John Arbash Meinel

4:21 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

geremy condra wrote:

...

On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

...
On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser <chris@subtlety.com> wrote:

...
.. and "x = iter(s).next()" raises a StopIteration exception. And that's why the documented recipe should probably recommend next(iter(s), default) instead. Especially because iter(s).next() is not even valid code in 3.0.

This seems reasonably legible to you? Strikes me as coding by incantation. Also, while I've heard people say that the naive approach is slower, I'm not getting that result. Here's my test:

...
...
...
smrt = timeit.Timer("next(iter(s))", "s=set(range(100))") smrt.repeat(10) [1.2845709323883057, 0.60247397422790527, 0.59621405601501465, 0.59133195877075195, 0.58387589454650879, 0.56839084625244141, 0.56839680671691895, 0.56877803802490234, 0.56905913352966309, 0.56846404075622559]

...
...
...
naive = timeit.Timer("x=s.pop();s.add(x)", "s=set(range(100))") naive.repeat(10) [0.93139314651489258, 0.53566789627075195, 0.53674602508544922, 0.53608798980712891, 0.53634309768676758, 0.53557991981506348, 0.53578495979309082, 0.53666114807128906, 0.53576493263244629, 0.53491711616516113]

Perhaps my test is flawed in some way?

Geremy Condra

Well, it also uses a fairly trivial set. 'set(range(100))' is going to generally have 0 collisions and everything will hash to a unique bucket. As such, pop ing an item out of the hash is a single "val = table[int & mask]; table[int & mask] = _dummy", and then looking it up again requires 2 table lookups (one finds the _dummy, the next finds that the chain is broken and can rewrite the _dummy.) However, if a set is more full, or has more collisions, or ... then pop() and add() become relatively more expensive. Surprising to me, is that "next(iter(s))" was actually slower than .pop() + .add() for 100 node set in my testing: $ alias TIMEIT="python -m timeit -s 's = set(range(100)'" $ TIMEIT "x = next(iter(s))" 1000000 loops, best of 3: 0.263 usec per loop $ TIMEIT "x = s.pop(); s.add(x)" 1000000 loops, best of 3: 0.217 usec per loop though both are much slower than the fastest we've found: $ TIMEIT "for x in s: break" 10000000 loops, best of 3: 0.0943 usec per loop now, what about a set with *lots* of collisions. Create 100 keys that all hash to the same bucket: aliase TIMEIT="python -m timeit -s 's = set([x*1024*1024 for x in range(100)])'" $ TIMEIT "x = next(iter(s))" 1000000 loops, best of 3: 0.257 usec per loop $ TIMEIT "x = s.pop(); s.add(x)" 1000000 loops, best of 3: 0.218 usec per loop I tried a few different ways, and I got the same results, until I did: $ python -m timeit -s "s = set(range(100000, 1000100))" "next(iter(s))" 1000 loops, best of 3: 255 usec per loop ^^^^ Now something seems terribly wrong here. next(iter(s)) suddenly jumps up from being < 0.3 us, to being more than 200us. Or ~1000x slower. I realize the above has 900k keys, which is big. But 'next(iter(s))' should only touch 1, and, in fact, should always return the *first* entry. My best guess is just that the *first* entry in the internal set table is no longer close to offset 0, which means that 'next(iter(s))' has to evaluate a bunch of table slots before it finds a non-empty entry. Anyway, none of the proposals will really ever be faster than: for x in s: break It is a bit ugly of a construct, but you don't have an attribute lookup, etc. As long as you don't do: for x in s: pass Then it stays nice and fast. John =:->

geremy condra

5:04 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Thu, Nov 5, 2009 at 11:21 PM, John Arbash Meinel <john.arbash.meinel@gmail.com> wrote:

...

geremy condra wrote:

...
On Thu, Nov 5, 2009 at 4:09 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:

...
On Thu, Nov 5, 2009 at 3:43 PM, Chris Bergstresser <chris@subtlety.com> wrote:

...
.. and "x = iter(s).next()" raises a StopIteration exception. And that's why the documented recipe should probably recommend next(iter(s), default) instead. Especially because iter(s).next() is not even valid code in 3.0.

This seems reasonably legible to you? Strikes me as coding by incantation. Also, while I've heard people say that the naive approach is slower, I'm not getting that result. Here's my test:

...
...
...
smrt = timeit.Timer("next(iter(s))", "s=set(range(100))") smrt.repeat(10) [1.2845709323883057, 0.60247397422790527, 0.59621405601501465, 0.59133195877075195, 0.58387589454650879, 0.56839084625244141, 0.56839680671691895, 0.56877803802490234, 0.56905913352966309, 0.56846404075622559]

...
...
...
naive = timeit.Timer("x=s.pop();s.add(x)", "s=set(range(100))") naive.repeat(10) [0.93139314651489258, 0.53566789627075195, 0.53674602508544922, 0.53608798980712891, 0.53634309768676758, 0.53557991981506348, 0.53578495979309082, 0.53666114807128906, 0.53576493263244629, 0.53491711616516113]

Perhaps my test is flawed in some way?

Geremy Condra

Well, it also uses a fairly trivial set. 'set(range(100))' is going to generally have 0 collisions and everything will hash to a unique bucket. As such, pop ing an item out of the hash is a single "val = table[int & mask]; table[int & mask] = _dummy", and then looking it up again requires 2 table lookups (one finds the _dummy, the next finds that the chain is broken and can rewrite the _dummy.)

However, if a set is more full, or has more collisions, or ... then pop() and add() become relatively more expensive.

Surprising to me, is that "next(iter(s))" was actually slower than .pop() + .add() for 100 node set in my testing:

$ alias TIMEIT="python -m timeit -s 's = set(range(100)'" $ TIMEIT "x = next(iter(s))" 1000000 loops, best of 3: 0.263 usec per loop

$ TIMEIT "x = s.pop(); s.add(x)" 1000000 loops, best of 3: 0.217 usec per loop

though both are much slower than the fastest we've found: $ TIMEIT "for x in s: break" 10000000 loops, best of 3: 0.0943 usec per loop

now, what about a set with *lots* of collisions. Create 100 keys that all hash to the same bucket: aliase TIMEIT="python -m timeit -s 's = set([x*1024*1024 for x in range(100)])'" $ TIMEIT "x = next(iter(s))" 1000000 loops, best of 3: 0.257 usec per loop

$ TIMEIT "x = s.pop(); s.add(x)" 1000000 loops, best of 3: 0.218 usec per loop

I tried a few different ways, and I got the same results, until I did:

$ python -m timeit -s "s = set(range(100000, 1000100))" "next(iter(s))" 1000 loops, best of 3: 255 usec per loop

^^^^ Now something seems terribly wrong here. next(iter(s)) suddenly jumps up from being < 0.3 us, to being more than 200us. Or ~1000x slower.

I realize the above has 900k keys, which is big. But 'next(iter(s))' should only touch 1, and, in fact, should always return the *first* entry. My best guess is just that the *first* entry in the internal set table is no longer close to offset 0, which means that 'next(iter(s))' has to evaluate a bunch of table slots before it finds a non-empty entry.

Anyway, none of the proposals will really ever be faster than: for x in s: break

It is a bit ugly of a construct, but you don't have an attribute lookup, etc. As long as you don't do: for x in s: pass

Then it stays nice and fast.

John =:->

Thanks for the info. Doing a bit more digging it appears that taking the lookup out of the picture puts the pop/add more or less on par with the for x in s: break solution, with the former behaving more predictably and the latter averaging marginally faster times. Either way, the loss in readability isn't worth it to me to get the minor improvement in performance. Given that adding a set.pick method would incur half the lookup cost that pop/add does, I think its reasonable to say that even using the fastest method proposed to implement it won't differ all that greatly from the least efficient proposal, and that its therefore pointless to consider set.pick from an optimisation standpoint. Aesthetics, of course, are another thing altogether. Geremy Condra

"Martin v. Löwis"

9:21 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

...

I feel pretty strongly that it's a wart in the language, and a sufficiently strong one that it should be remedied. I'm happy to champion it, but haven't the faintest idea what that entails.

There are two ways a) write a library that provides what you want, publish it on PyPI, and report back in a few years of how many users your library has, what they use it for, and why it should become builtin b) write a PEP, wait a few years for the language moratorium to be lifted, provide an implementation, and put the PEP for pronouncement. Careful reading of the Moratorium PEP may allow shortening of the wait. In any case, it seems that this specific change will see some opposition. So you will need to convince the opposition, one way or the other.

...

The obvious way, for newcomers, of achieving the effect is:

x = s.pop() s.add(x)

... and that's simply horrible in terms of efficiency. So the "obvious" way of doing it in Python is wrong(TM), and the "correct" way of doing it is obscure and raises misleading exceptions.

If you chose to write a PEP, include a proof that this approach is indeed horrible in terms of efficiency; I question that. Regards, Martin

Chris Bergstresser

10:19 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

On Thu, Nov 5, 2009 at 3:21 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...

There are two ways

a) write a library that provides what you want, publish it on PyPI, and report back in a few years of how many users your library has, what they use it for, and why it should become builtin

This clearly isn't called for in this case. We're talking about a single function on a collection. In this case, importing an alternative set API (and maintaining the dependency) is more work than just writing your own workaround. The purpose of adding a method is to prevent the need of everyone writing their own workaround.

...

b) write a PEP, wait a few years for the language moratorium to be lifted, provide an implementation, and put the PEP for pronouncement. Careful reading of the Moratorium PEP may allow shortening of the wait.

Clearly, I'll need to write up the PEP.

...

In any case, it seems that this specific change will see some opposition. So you will need to convince the opposition, one way or the other.

I doubt some of the people on either side are going to be convinced. I'd settle for convincing most of the fence-sitters, along with a few of the loyal opposition. -- Chris

Raymond Hettinger

10:30 p.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

[Chris Bergstresser]

...

Clearly, I'll need to write up the PEP.

Why not write a short, fast get_first() function for your utils directory and be done with it? That could work with sets, mappings, generators, and other containers and iterators. No need to fatten the set/frozenset API for something so trivial and so rarely needed. Raymond

Raymond Hettinger

11:02 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

[me]

...

Why not write a short, fast get_first() function for your utils directory and be done with it? That could work with sets, mappings, generators, and other containers and iterators. No need to fatten the set/frozenset API for something so trivial and so rarely needed.

Forgot to post the code. It is short, fast, and easy. It is explicit about handing the case with an empty input. And it is specific about which value it returns (always the first iterated value; not an arbitrary one). There's no guessing about what it does. It gets the job done. def first(iterable): 'Return the first value from a container or iterable. If empty, raises LookupError' for value in iterable: return value raise LookupError('no first value; iterable is empty') If desired, it is not hard to change to the last time to return a default value or some other exception. Raymond

Chris Bergstresser

2:59 a.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger <python@rcn.com> wrote:

...

Forgot to post the code. It is short, fast, and easy. It is explicit about handing the case with an empty input. And it is specific about which value it returns (always the first iterated value; not an arbitrary one). There's no guessing about what it does. It gets the job done.

I'm trying to take this suggestion in the best possible light, which is that you honestly think I didn't read past Chapter 3 of the Python Tutorial, and I am therefore in fact unfamiliar with function definitions. -- Chris

"Martin v. Löwis"

5:43 a.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

...

On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger <python@rcn.com> wrote:

...
Forgot to post the code. It is short, fast, and easy. It is explicit about handing the case with an empty input. And it is specific about which value it returns (always the first iterated value; not an arbitrary one). There's no guessing about what it does. It gets the job done.

I'm trying to take this suggestion in the best possible light, which is that you honestly think I didn't read past Chapter 3 of the Python Tutorial, and I am therefore in fact unfamiliar with function definitions.

I read Raymond's suggestion rather as a question: why bother with a tedious, multi-year process, when a three-line function will achieve exactly the same? Regards, Martin

Chris Bergstresser

6:19 a.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Thu, Nov 5, 2009 at 11:43 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...

I read Raymond's suggestion rather as a question: why bother with a tedious, multi-year process, when a three-line function will achieve exactly the same?

Because it doesn't achieve exactly the same. What I want is a sane, rational way to describe what I'm doing in the core API, so other programmers learning the language don't spend the amount of time I did perplexed that there was a .pop() and a .remove() and a .discard(), but there wasn't a .pick(). I don't want to have to write the same little helper function in every project to fill a deficiency in the library. I don't want to have to argue about the flaws in solutions with race conditions, or the fact that cheap functions become expensive functions when performed over the network, or that there's a real value in having an atomic operation which throws a sane exception when it fails, or how it's disturbing to my OCD core to have an API which believes: if x in s: s.remove(x) ... is too confusing, so there should be a .discard() method, but ... for x in s: break ... is self-evident and obvious, so there's no need for a .pick(). -- Chris

Georg Brandl

11:59 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

Chris Bergstresser schrieb:

...

On Thu, Nov 5, 2009 at 11:43 PM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...
I read Raymond's suggestion rather as a question: why bother with a tedious, multi-year process, when a three-line function will achieve exactly the same?

Because it doesn't achieve exactly the same. What I want is a sane, rational way to describe what I'm doing in the core API, so other programmers learning the language don't spend the amount of time I did perplexed that there was a .pop() and a .remove() and a .discard(), but there wasn't a .pick(). I don't want to have to write the same little helper function in every project to fill a deficiency in the library.

Why don't you write that little helper function only in those projects that actually need it? I suspect that will be a small fraction...

...

I don't want to have to argue about the flaws in solutions with race conditions, or the fact that cheap functions become expensive functions when performed over the network, or that there's a real value in having an atomic operation which throws a sane exception when it fails, or how it's disturbing to my OCD core to have an API which believes:

if x in s: s.remove(x)

... is too confusing, so there should be a .discard() method, but ...

for x in s: break

... is self-evident and obvious, so there's no need for a .pick().

It's not just the self-evidence of the required "workaround" that determines whether an API function is added to a builtin type. It's whether you need it. Imagine someone asking for a function on lists that removes duplicates, but keeps the order of all first occurrences. This is not a completely unimaginable method, and writing it by hand is more than a few lines (see e.g. http://code.activestate.com/recipes/52560/#c1). Still, it's not needed frequently enough to require being a list method. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Daniel Stutzbach

2:17 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Thu, Nov 5, 2009 at 5:02 PM, Raymond Hettinger <python@rcn.com> wrote:

...

def first(iterable): 'Return the first value from a container or iterable. If empty, raises LookupError' for value in iterable: return value raise LookupError('no first value; iterable is empty')

(This email should not be construed to mean that I support adding a .pick method; I do not.) I realized this morning that the "for i in s: break" idiom can be O(n), even assuming no hash collisions. Observe this subtly O(n**2) algorithm: Cashew:~$ cat > test.py while s: for i in s: break # Imagine a complex algorithm here that depends on i still being in s s.remove(i) Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(20000))' "`cat test.py`" 10 loops, best of 3: 31.2 msec per loop Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(40000))' "`cat test.py`" 10 loops, best of 3: 124 msec per loop Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(80000))' "`cat test.py`" 10 loops, best of 3: 491 msec per loop Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(160000))' "`cat test.py`" 10 loops, best of 3: 1.96 sec per loop Doubling n quadruples the run-time, i.e., O(n**2) behavior. I wonder if those clamoring for a pick() method and complaining about efficiency are inadvertently running into this? The problem arises because we're systematically unbalancing the hash table. The iterator returns the first valid entry in the hash table, which we remove. Repeat several times and pretty soon the iterator has to spend O(n) time scanning through empty entries to find the first remaining valid entry. On the other hand, the pop/add idiom is O(1), since the .pop implementation contains some voodoo to remember where it left off. Consequently, the following algorithm is O(n) instead of O(n**2): Cashew:~$ cat > test.py while s: i = s.pop() s.add(i) # Imagine a complex algorithm here that depends on i still being in s s.remove(i) Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(400000))' "`cat test.py`" 10 loops, best of 3: 28 msec per loop Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(800000))' "`cat test.py`" 10 loops, best of 3: 56.2 msec per loop Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(1600000))' "`cat test.py`" 10 loops, best of 3: 113 msec per loop Cashew:~$ python2.5 /usr/lib/python2.5/timeit.py -s 's=set(range(3200000))' "`cat test.py`" 10 loops, best of 3: 227 msec per loop Doubling n doubles the run-time, i.e., O(n) behavior. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

"Martin v. Löwis"

8:42 a.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

...

The problem arises because we're systematically unbalancing the hash table. The iterator returns the first valid entry in the hash table, which we remove. Repeat several times and pretty soon the iterator has to spend O(n) time scanning through empty entries to find the first remaining valid entry.

Interesting. Something goes wrong, it seems: if items get removed over and over again, I think the set should shrink (not sure whether it actually does). Then, I think you should end up with an amortized O(1) for selecting an element (assuming that the underlying hashes don't collide). Regards, Martin

Nick Coghlan

1:32 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

Martin v. Löwis wrote:

...

...
The problem arises because we're systematically unbalancing the hash table. The iterator returns the first valid entry in the hash table, which we remove. Repeat several times and pretty soon the iterator has to spend O(n) time scanning through empty entries to find the first remaining valid entry.

Interesting. Something goes wrong, it seems: if items get removed over and over again, I think the set should shrink (not sure whether it actually does). Then, I think you should end up with an amortized O(1) for selecting an element (assuming that the underlying hashes don't collide).

The behaviour is inherited (in spirit) from dicts. Courtesy of dictnotes.txt: """ * Shrinkage rate upon exceeding maximum sparseness. The current CPython code never even checks sparseness when deleting a key. When a new key is added, it resizes based on the number of active keys, so that the addition may trigger shrinkage rather than growth. """ """ Dictionary operations involving only a single key can be O(1) unless resizing is possible. By checking for a resize only when the dictionary can grow (and may *require* resizing), other operations remain O(1), and the odds of resize thrashing or memory fragmentation are reduced. In particular, an algorithm that empties a dictionary by repeatedly invoking .pop will see no resizing, which might not be necessary at all because the dictionary is eventually discarded entirely. """ So the rationale is to ensure that only add operations perform a resize and so that sequential pop() operations don't incur excessive resizing costs. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

"Martin v. Löwis"

9:50 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

...

So the rationale is to ensure that only add operations perform a resize and so that sequential pop() operations don't incur excessive resizing costs.

I agree that the use case of repeated .pop() operations is reasonable, and (IIUC) that case is also special-cased using a finger/index. I think for regular removal, the same logic should not apply: if a series of removals is performed, then further (non-pop) removals see increasing costs, as do regular lookups. So I think that a removal should trigger shrinking (with appropriate thresholds) unless it's a .pop. Regards, Martin

Daniel Stutzbach

10:44 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Mon, Nov 9, 2009 at 3:50 PM, "Martin v. Löwis" <martin@v.loewis.de>wrote:

...

I think for regular removal, the same logic should not apply: if a series of removals is performed, then further (non-pop) removals see increasing costs, as do regular lookups. So I think that a removal should trigger shrinking (with appropriate thresholds) unless it's a .pop.

Minor technicality, but it's the next() operation of the iterator that has increasing cost as the set/dict becomes sparse. Removals are always O(1). The removal uses the hash to find the item quickly. The iterator has to scan through the table for non-empty entries. (the above assumes a good hash function with few collisions, of course) -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Daniel Stutzbach

3:09 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Mon, Nov 9, 2009 at 2:42 AM, "Martin v. Löwis" <martin@v.loewis.de>wrote:

...

Interesting. Something goes wrong, it seems: if items get removed over and over again, I think the set should shrink (not sure whether it actually does). Then, I think you should end up with an amortized O(1) for selecting an element (assuming that the underlying hashes don't collide).

I'm not sure if Python's implementation shrinks or not, but even if it did shrink, it would still be amortized O(n). Imagine a completely full hash table that currently contains n elements in n slots (I know this cannot happen in Python's implementation but it's useful for illustrative purposes). Assume it will shrink when it gets down to n/2 elements. Here is my pathological algorithm again, for reference purposes: while s: for i in s: break # Imagine a complex algorithm here that depends on i still being in s s.remove(i) We repeatedly search through the slots sequentially and remove the first element we find. The first removal requires checking 1 slot, the second removal requires checking 2 slots, the third removal requires checking 3 slots, and so forth. The hash table will shrink after the n/2-th removal, when we have checked 1 + 2 + 3 + ... + n/2 = O(n**2) slots for n/2 removals (or amortized O(n) per removal). It's too late for shrinking to save us; we've already performed too much work. -- Daniel Stutzbach, Ph.D. President, Stutzbach Enterprises, LLC <http://stutzbachenterprises.com>

Alexander Belopolsky

3:36 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Mon, Nov 9, 2009 at 10:09 AM, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:

...

On Mon, Nov 9, 2009 at 2:42 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...
Interesting. Something goes wrong, it seems: if items get removed over and over again, I think the set should shrink (not sure whether it actually does). Then, I think you should end up with an amortized O(1) for selecting an element (assuming that the underlying hashes don't collide).

I'm not sure if Python's implementation shrinks or not,

It does not:

...

...
...
s = set(range(100000)) from sys import getsizeof getsizeof(s) 4194536 while s: x = s.pop() ... getsizeof(s) 4194536 s.clear() getsizeof(s) 232

Nick Coghlan

9:44 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

Alexander Belopolsky wrote:

...

On Mon, Nov 9, 2009 at 10:09 AM, Daniel Stutzbach <daniel@stutzbachenterprises.com> wrote:

...
On Mon, Nov 9, 2009 at 2:42 AM, "Martin v. Löwis" <martin@v.loewis.de> wrote:

...
Interesting. Something goes wrong, it seems: if items get removed over and over again, I think the set should shrink (not sure whether it actually does). Then, I think you should end up with an amortized O(1) for selecting an element (assuming that the underlying hashes don't collide). I'm not sure if Python's implementation shrinks or not,

It does not:

...
...
...
s = set(range(100000)) from sys import getsizeof getsizeof(s) 4194536 while s: x = s.pop() ... getsizeof(s) 4194536 s.clear() getsizeof(s) 232

Interestingly, it looks like there the sparseness check isn't triggering on addition of items either (when dictnotes.txt says it should):

...

...
...
from sys import getsizeof s = set(range(100000)) getsizeof(s) 2097264 while s: x = s.pop() ... getsizeof(s) 2097264 s.add(1) getsizeof(s) 2097264

I did a similar test with dict.fromkeys and that also didn't resize until .clear() was invoked. I don't know the current criteria settings for the sparseness check, but it seems odd that going from 100k active keys to none wouldn't cause a resize... Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

"Martin v. Löwis"

10:39 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

...

We repeatedly search through the slots sequentially and remove the first element we find. The first removal requires checking 1 slot, the second removal requires checking 2 slots, the third removal requires checking 3 slots, and so forth. The hash table will shrink after the n/2-th removal, when we have checked 1 + 2 + 3 + ... + n/2 = O(n**2) slots for n/2 removals (or amortized O(n) per removal). It's too late for shrinking to save us; we've already performed too much work.

Ah, right. Perhaps people writing the loop the way you proposed deserve to get bad performance, as they should use .pop in the first place. Regards, Martin

Björn Lindqvist

4:40 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

2009/11/6 Raymond Hettinger <python@rcn.com>:

...

[me]

...
Why not write a short, fast get_first() function for your utils directory and be done with it? That could work with sets, mappings, generators, and other containers and iterators. No need to fatten the set/frozenset API for something so trivial and so rarely needed.

Forgot to post the code. It is short, fast, and easy. It is explicit about handing the case with an empty input. And it is specific about which value it returns (always the first iterated value; not an arbitrary one). There's no guessing about what it does. It gets the job done.

def first(iterable): 'Return the first value from a container or iterable. If empty, raises LookupError' for value in iterable: return value raise LookupError('no first value; iterable is empty')

If desired, it is not hard to change to the last time to return a default value or some other exception.

That function is very nice and genericly lisp-like. Couldn't that one be in the builtins? It would both solve the problem and avoid fattening the set API. -- mvh Björn

Steven D'Aprano

8:16 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

On Tue, 10 Nov 2009 03:40:11 am Björn Lindqvist wrote:

...

2009/11/6 Raymond Hettinger <python@rcn.com>:

...
[me]

...
Why not write a short, fast get_first() function for your utils directory and be done with it? That could work with sets, mappings, generators, and other containers and iterators. No need to fatten the set/frozenset API for something so trivial and so rarely needed.

Forgot to post the code. It is short, fast, and easy. It is explicit about handing the case with an empty input. And it is specific about which value it returns (always the first iterated value; not an arbitrary one). There's no guessing about what it does. It gets the job done.

def first(iterable): 'Return the first value from a container or iterable. If empty, raises LookupError' for value in iterable: return value raise LookupError('no first value; iterable is empty')

If desired, it is not hard to change to the last time to return a default value or some other exception.

That function is very nice and genericly lisp-like. Couldn't that one be in the builtins? It would both solve the problem and avoid fattening the set API.

I'm not sure, but isn't that thread-unsafe? Because sets aren't directly iterable, `for value in iterable` builds a set_iterator operator. If another thread modifies the set after the set_iterator is built, but before the value is looked up, you will get a mysterious RuntimeError almost impossible to debug.

...

...
...
def first(): # simplified ... for value in iterator: ... return value ... dis.dis(first) 2 0 SETUP_LOOP 15 (to 18) 3 LOAD_GLOBAL 0 (iterator) 6 GET_ITER 7 FOR_ITER 7 (to 17) 10 STORE_FAST 0 (value)

3 13 LOAD_FAST 0 (value) 16 RETURN_VALUE >> 17 POP_BLOCK >> 18 LOAD_CONST 0 (None) 21 RETURN_VALUE Is it possible for another thread to be called between the GET_ITER and STORE_FAST? -- Steven D'Aprano

"Martin v. Löwis"

10:19 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

...

I'm not sure, but isn't that thread-unsafe?

You are right; it's thread-unsafe. I would fix it by catching the RuntimeError, and retrying. Given the current GIL strategy (including proposed changes to it), it won't happen two times in a row, so the number of retries would be bounded. Regards, Martin

Nick Coghlan

10:51 a.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

Martin v. Löwis wrote:

...

...
I'm not sure, but isn't that thread-unsafe?

You are right; it's thread-unsafe.

I would fix it by catching the RuntimeError, and retrying. Given the current GIL strategy (including proposed changes to it), it won't happen two times in a row, so the number of retries would be bounded.

It's also one of the major reasons for not sharing mutable containers between threads if you can avoid it (and serialising access to them if you can't) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Antoine Pitrou

1:55 p.m.

New subject: [Python-Dev] Retrieve an arbitrary element from asetwithoutremoving it

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...

It's also one of the major reasons for not sharing mutable containers between threads if you can avoid it (and serialising access to them if you can't)

Not necessarily, for example it is common to rely on the fact that list.append() is atomic. Regards Antoine.

Nick Coghlan

2:27 p.m.

New subject: Retrieve an arbitrary element from asetwithoutremoving it

Antoine Pitrou wrote:

...

Nick Coghlan <ncoghlan <at> gmail.com> writes:

...
It's also one of the major reasons for not sharing mutable containers between threads if you can avoid it (and serialising access to them if you can't)

Not necessarily, for example it is common to rely on the fact that list.append() is atomic.

s/"mutable containers"/"mutable-containers-that-object-loudly-to-their-size-changing-during-iteration -like-sets-and-dicts" :) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Cameron Simpson

5:26 a.m.

New subject: Retrieve an arbitrary element from a set withoutremoving it

On 04Nov2009 09:46, Steven D'Aprano <steve@pearwood.info> wrote: | On Tue, 3 Nov 2009 09:19:38 am Greg Ewing wrote: | > Cameron Simpson wrote: | > > Personally, I'm for the iteration spec in a lot of ways. For the record, I've since, in mere hours or day, been convinced my preference was misguided. I _do_ still feel that a hypothetical .pick() (a name I think better than .get()) would be "nicer" if it didn't always return the same item. Not that I've imagined any use case not equally well served by the existing iterator facility :-( Greg Ewing: | > I completely fail to see a use case for this. Steven D'Aprano: | Nevertheless, people keep requesting it, so obviously they have a use | for it. None of the standard solutions are obvious or easily | discoverable, and in my experience the usual solution people come up | with is to pop() an element, then add() it back in, but of course | that's not just inelegant but it fails on frozensets. Question: has anyone asked for .pick()/.get() with a use case _not_ well served by the existing facilities? Personally, I have found it useful in doco I write to have a section on "Common Tasks", with recommended/suggested examples of how to do them and short rationale for the chosen method. It seems to me that if .pick() is frequently desired and "None of the standard solutions are obvious or easily discoverable" then they should be _made_ so with documentation. Comments? -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ Politics, n. Strife of interests masquerading as a contest of principles. - Ambrose Bierce, _The_Devil's_Dictionary_

Raymond Hettinger

6:35 a.m.

New subject: Retrieve an arbitrary element from a setwithoutremoving it

...

Personally, I have found it useful in doco I write to have a section on "Common Tasks", with recommended/suggested examples of how to do them and short rationale for the chosen method. It seems to me that if .pick() is frequently desired and "None of the standard solutions are obvious or easily discoverable" then they should be _made_ so with documentation.

Comments?

That's reasonable. It's in the same category as people who can't figure-out how to clear a list because they forgot about slice notation. Raymond

Alexander Belopolsky

October 2009

5:50 p.m.

On Tue, Oct 27, 2009 at 1:33 PM, Chris Bergstresser <chris@subtlety.com> wrote:

...

On Tue, Oct 27, 2009 at 11:06 AM, Georg Brandl <g.brandl@gmx.net> wrote:

...
Sorry to nitpick, but there is no list.get().

No? How ... odd.

Odd indeed. My first reaction was: it is not needed because lists support slicing, but when I tried to construct a list.get() using slicing the best I could come up with was the following hack

...

...
...
def lget(l, i, v): return (l[i:] or [v])[0] ... lget(range(10), 20, 200) 200 lget(range(10), 5, 50) 5

Yet for some reason I never missed this functionality ...

Nick Coghlan

6:42 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

Alexander Belopolsky wrote:

...

Odd indeed. My first reaction was: it is not needed because lists support slicing, but when I tried to construct a list.get() using slicing the best I could come up with was the following hack

...
...
...
def lget(l, i, v): return (l[i:] or [v])[0] ... lget(range(10), 20, 200) 200 lget(range(10), 5, 50) 5

Yet for some reason I never missed this functionality ...

People tend to do length checks on lists to decide which items are and are not present. You can't do that with a dict. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

average

4:22 p.m.

[Guido wrote:]

...

- If sets were to grow an API to non-destructively access the object stored in the set for a particular key, then dicts should have such a method too. - I still wish we could go back in time and unify sets and dicts, if only to find out how that experiment would turn out.

+5. If Python3 were to have this feature it would make it worth migrating to (and nearly worthy of the intent that was behind the idea of python3k). FWIW ... :) marcos

Willi Richert

8:53 p.m.

Hi, surprised about the performance of for/break provided by Vitor, I did some more testing. It revealed that indeed we can forget the get() (which was implemented as a stripped down pop()): from timeit import * stats = ["for i in xrange(1000): iter(s).next() ", "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak ", "for i in xrange(1000): s.add(s.pop()) ", "for i in xrange(1000): s.get() "] for stat in stats: t = Timer(stat, setup="s=set(range(100))") try: print "Time for %s:\t %f"%(stat, t.timeit(number=1000)) except: t.print_exc() $ ./test_get.py Time for for i in xrange(1000): iter(s).next() : 0.433080 Time for for i in xrange(1000): for x in s: break : 0.148695 Time for for i in xrange(1000): s.add(s.pop()) : 0.317418 Time for for i in xrange(1000): s.get() : 0.146673 In some tests, for/break was even slightly faster then get(). As always, intuition regarding performance bottlenecks is flawed ;-) Anyway, thanks for all the helpful comments, especially to Stefan for the http://comments.gmane.org/gmane.comp.python.ideas/5606 link. Regards, wr Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:

...

Vitor Bosshard wrote:

...
2009/10/23 Willi Richert <w.richert@gmx.net>:

...
Hi,

recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it.

Three possibilities came to mind:

1. x = some_set.pop() some_set.add(x)

2. for x in some_set: break

3. x = iter(some_set).next()

Of course, the third should be the fastest. It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this?

I imagine something like

x = some_set.get()

I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

So in my testing (2) was actually the fastest. I assumed because .next() was a function call overhead, while: for x in some_set: break

Was evaluated inline. It probably still has to call PyObject_GetIter, however it doesn't have to create a stack frame for it.

This is what "timeit" tells me. All runs are of the form: python -m timeit -s "s = set([10])" ...

0.101us "for x in s: break; x" 0.130us "for x in s: pass; x" 0.234us -s "n = next; i = iter" "x = n(i(s)); x" 0.248us "x = next(iter(s)); x" 0.341us "x = iter(s).next(); x"

So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement?

Note that all of these are significantly < 1us. So this only matters if it is something you are doing often.

I don't know your specific timings, but I would guess that: for x in s: break

Is actually going to be faster than your s.get()

Primarily because s.get() requires an attribute lookup. I would think your version might be faster for: stat2 = "g = s.get; for i in xrange(100): g()"

However, that is still a function call, which may be treated differently by the interpreter than the for:break loop. I certainly suggest you try it and compare.

John =:->

Steven D'Aprano

10:49 p.m.

On Sat, 24 Oct 2009 07:53:24 am Willi Richert wrote:

...

Hi,

surprised about the performance of for/break provided by Vitor, I did some more testing. It revealed that indeed we can forget the get() (which was implemented as a stripped down pop()):

I don't understand that conclusion. According to your tests, your implementation of get() is as fast as "for x in set: break", and it's certainly much, much more readable and straightforward. -- Steven D'Aprano

Willi Richert

2:29 p.m.

Hi, I agree. But, here are the pros/cons collected from the recent list repsonses: Pro: - more readable - newbies will encounter one of the fastest solution (.get()) before trying slower "first solutions" like (iter(set).next()) Cons: - no name consensus. get() getany() arbitrary() ? - BDFL moratorium, which I find very wise (get() is, however, no language extension, but std lib extension, which Guido did not moratorize, didn't he?) - other classes should then also be extended, like frozenset Regards, wr PS: what is the correct verb form of moratorium? Am Samstag, 24. Oktober 2009 00:49:38 schrieb Steven D'Aprano:

...

On Sat, 24 Oct 2009 07:53:24 am Willi Richert wrote:

...
Hi,

surprised about the performance of for/break provided by Vitor, I did some more testing. It revealed that indeed we can forget the get() (which was implemented as a stripped down pop()):

I don't understand that conclusion. According to your tests, your implementation of get() is as fast as "for x in set: break", and it's certainly much, much more readable and straightforward.

Willi Richert

3:41 p.m.

Hi, someone on this list mentioned that much of the s.get() time is spend on the name lookup for get(). That is indeed the case: =================== from timeit import * stats = ["for i in xrange(1000): iter(s).next() ", "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak", "for i in xrange(1000): s.add(s.pop()) ", "for i in xrange(1000): s.get() ", "g=s.get;\nfor i in xrange(1000): g() "] for stat in stats: t = Timer(stat, setup="s=set(range(1000))") print "Time for %s:\t %f"%(stat, t.timeit(number=1000)) ================== Time for for i in xrange(1000): iter(s).next() : 0.448227 Time for for i in xrange(1000): for x in s: break: 0.141669 Time for for i in xrange(1000): s.add(s.pop()) : 0.348055 Time for for i in xrange(1000): s.get() : 0.148580 Time for g=s.get; for i in xrange(1000): g() : 0.080563 So, now set.get() is indeed the fastest and preferable solution if you need massive amounts of retrieving elements from a set without removing them. wr Am Freitag, 23. Oktober 2009 22:53:24 schrieb Willi Richert:

...

Hi,

surprised about the performance of for/break provided by Vitor, I did some more testing. It revealed that indeed we can forget the get() (which was implemented as a stripped down pop()):

from timeit import * stats = ["for i in xrange(1000): iter(s).next() ", "for i in xrange(1000): \n\tfor x in s: \n\t\tbreak ", "for i in xrange(1000): s.add(s.pop()) ", "for i in xrange(1000): s.get() "]

for stat in stats: t = Timer(stat, setup="s=set(range(100))") try: print "Time for %s:\t %f"%(stat, t.timeit(number=1000)) except: t.print_exc()

$ ./test_get.py Time for for i in xrange(1000): iter(s).next() : 0.433080 Time for for i in xrange(1000): for x in s: break : 0.148695 Time for for i in xrange(1000): s.add(s.pop()) : 0.317418 Time for for i in xrange(1000): s.get() : 0.146673

In some tests, for/break was even slightly faster then get().

As always, intuition regarding performance bottlenecks is flawed ;-)

Anyway, thanks for all the helpful comments, especially to Stefan for the http://comments.gmane.org/gmane.comp.python.ideas/5606 link.

Regards, wr

Am Freitag, 23. Oktober 2009 19:25:48 schrieb John Arbash Meinel:

...
Vitor Bosshard wrote:

...
2009/10/23 Willi Richert <w.richert@gmx.net>:

...
Hi,

recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it.

Three possibilities came to mind:

1. x = some_set.pop() some_set.add(x)

2. for x in some_set: break

3. x = iter(some_set).next()

Of course, the third should be the fastest. It nevertheless goes through all the iterator creation stuff, which costs some time. I wondered, why the builtin set does not provide a more direct and efficient way for retrieving some element without removing it. Is there any reason for this?

I imagine something like

x = some_set.get()

I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

So in my testing (2) was actually the fastest. I assumed because .next() was a function call overhead, while: for x in some_set: break

Was evaluated inline. It probably still has to call PyObject_GetIter, however it doesn't have to create a stack frame for it.

This is what "timeit" tells me. All runs are of the form: python -m timeit -s "s = set([10])" ...

0.101us "for x in s: break; x" 0.130us "for x in s: pass; x" 0.234us -s "n = next; i = iter" "x = n(i(s)); x" 0.248us "x = next(iter(s)); x" 0.341us "x = iter(s).next(); x"

So 'for x in s: break' is about 2x faster than next(iter(s)) and 3x faster than (iter(s).next()). I was pretty surprised that it was 30% faster than "for x in s: pass". I assume it has something to do with a potential "else:" statement?

Note that all of these are significantly < 1us. So this only matters if it is something you are doing often.

I don't know your specific timings, but I would guess that: for x in s: break

Is actually going to be faster than your s.get()

Primarily because s.get() requires an attribute lookup. I would think your version might be faster for: stat2 = "g = s.get; for i in xrange(100): g()"

However, that is still a function call, which may be treated differently by the interpreter than the for:break loop. I certainly suggest you try it and compare.

John =:->

Adam Olsen

8:09 p.m.

On Fri, Oct 23, 2009 at 11:04, Vitor Bosshard <algorias@gmail.com> wrote:

...

I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

item, = set_of_one -- Adam Olsen, aka Rhamphoryncus

John Arbash Meinel

1:47 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

Adam Olsen wrote:

...

On Fri, Oct 23, 2009 at 11:04, Vitor Bosshard <algorias@gmail.com> wrote:

...
I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

item, = set_of_one

Interesting. It depends a bit on the speed of tuple unpacking, but presumably that is quite fast. On my system it is pretty darn good: 0.101us "for x in s: break" 0.112us "x, = s" 0.122us "for x in s: pass" So not quite as good as the for loop, but quite close. John =:->

Alex Martelli

4:22 a.m.

Next(s) would seem good... Alex Sent from my iPhone On Oct 24, 2009, at 6:47 PM, John Arbash Meinel <john.meinel@canonical.com

...

wrote:

...

Adam Olsen wrote:

...
On Fri, Oct 23, 2009 at 11:04, Vitor Bosshard <algorias@gmail.com> wrote:

...
I see this as being useful for frozensets as well, where you can't get an arbitrary element easily due to the obvious lack of .pop(). I ran into this recently, when I had a frozenset that I knew had 1 element (it was the difference between 2 other sets), but couldn't get to that element easily (get the pun?)

item, = set_of_one

Interesting. It depends a bit on the speed of tuple unpacking, but presumably that is quite fast. On my system it is pretty darn good:

0.101us "for x in s: break" 0.112us "x, = s" 0.122us "for x in s: pass"

So not quite as good as the for loop, but quite close.

John =:-> _______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/aleaxit%40gmail.com

Terry Reedy

6:50 a.m.

Alex Martelli wrote:

...

Next(s) would seem good...

That does not work. It has to be next(iter(s)), and that has been tried and eliminated because it is significantly slower.

...

...
Interesting. It depends a bit on the speed of tuple unpacking, but presumably that is quite fast. On my system it is pretty darn good:

0.101us "for x in s: break" 0.112us "x, = s"

James Y Knight

5:09 p.m.

On Oct 25, 2009, at 2:50 AM, Terry Reedy wrote:

...

Alex Martelli wrote:

...
Next(s) would seem good...

That does not work. It has to be next(iter(s)), and that has been tried and eliminated because it is significantly slower.

But who cares about the speed of getting an arbitrary element from a set? How can it *possibly* be a problem in a real program? If you want to optimize python, this operation is certainly not the right place to start... James

Florian Weimer

5:57 p.m.

* James Y. Knight:

...

On Oct 25, 2009, at 2:50 AM, Terry Reedy wrote:

...
Alex Martelli wrote:

...
Next(s) would seem good...

That does not work. It has to be next(iter(s)), and that has been tried and eliminated because it is significantly slower.

But who cares about the speed of getting an arbitrary element from a set? How can it *possibly* be a problem in a real program?

Hmm, perhaps when using sets as work queues?

"Martin v. Löwis"

6:51 p.m.

New subject: Retrieve an arbitrary element from a set without removing it

...

Hmm, perhaps when using sets as work queues?

A number of comments: - it's somewhat confusing to use a set as a *queue*, given that it won't provide FIFO semantics. - there are more appropriate and direct container structures available, including a dedicated Queue module (even though this might be doing to much with its thread-safety). - if you absolutely want to use a set as a work queue, then the .pop() method should be sufficient, right? Regards, Martin

John Arbash Meinel

2:20 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 Martin v. Löwis wrote:

...

...
Hmm, perhaps when using sets as work queues?

A number of comments:

- it's somewhat confusing to use a set as a *queue*, given that it won't provide FIFO semantics. - there are more appropriate and direct container structures available, including a dedicated Queue module (even though this might be doing to much with its thread-safety). - if you absolutely want to use a set as a work queue, then the .pop() method should be sufficient, right?

Regards, Martin

We were using sets to track the tips of a graph, and to compare whether one node was an ancestor of another one. We were caching that answer into frozensets, since that made them immutable. If res = heads(node1, node2) if len(res) == 1: # What is the 'obvious' way to get the node out? I posit that there *isn't* an obvious way to get the single item out of a 1-entry frozenset. for x in res: break list(res)[0] set(res).pop() iter(res).next() [x for x in res][0] x, = res # I didn't think of this one before recently Are all answers, but none of them I would consider *obvious*. At the least, none of them are obviously better than another, so you look into the performance characteristics to give you a reason to pick one over the other. res.get() would be a fairly obvious way to do it. Enough that I would probably never have gone searching for any of the other answers. Though personally, I think I would call it "set.peek()", but the specific name doesn't really matter to me. John =:-> -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (Cygwin) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkrlB4MACgkQJdeBCYSNAAN0lQCgrtyXWlqIbjj01qB4AKPhKrMq QH8An0z2gCWZHoceEJsqRJOUdEl/VLTB =fJXI -----END PGP SIGNATURE-----

geremy condra

2:46 a.m.

...

Martin v. Löwis wrote:

...
...
Hmm, perhaps when using sets as work queues?

A number of comments:

- it's somewhat confusing to use a set as a *queue*, given that it won't provide FIFO semantics. - there are more appropriate and direct container structures available, including a dedicated Queue module (even though this might be doing to much with its thread-safety). - if you absolutely want to use a set as a work queue, then the .pop() method should be sufficient, right?

Regards, Martin

We were using sets to track the tips of a graph, and to compare whether one node was an ancestor of another one. We were caching that answer into frozensets, since that made them immutable. If

res = heads(node1, node2) if len(res) == 1: # What is the 'obvious' way to get the node out?

I posit that there *isn't* an obvious way to get the single item out of a 1-entry frozenset.

for x in res: break list(res)[0] set(res).pop() iter(res).next() [x for x in res][0] x, = res # I didn't think of this one before recently

Are all answers, but none of them I would consider *obvious*. At the least, none of them are obviously better than another, so you look into the performance characteristics to give you a reason to pick one over the other.

res.get() would be a fairly obvious way to do it. Enough that I would probably never have gone searching for any of the other answers. Though personally, I think I would call it "set.peek()", but the specific name doesn't really matter to me.

John

When I first wrote Graphine (graph library), I did something very similar to the last solution. The code has since been rewritten to avoid the issue, but it would be nice if it didn't have to be the next time it comes up- the comma is easy to miss, and while the results are common-sense once you get what the line is doing, it doesn't lend itself to immediate comprehension. Geremy Condra

"Martin v. Löwis"

5:53 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

...

We were using sets to track the tips of a graph, and to compare whether one node was an ancestor of another one. We were caching that answer into frozensets, since that made them immutable. If

res = heads(node1, node2) if len(res) == 1: # What is the 'obvious' way to get the node out?

What specifically is that that you want to do in this case? IIUC, the possible result could be that either node1 is an ancestor of node2, or vice versa. So I would write that as if len(res) == 1: if node1 in res: # action to take if node1 is the head else: # action to take if node2 is the head else: # they are unrelated In any case, this is a use case different from "work queue" (AFAICT), so I'm unsure why you responded to my message where Florian and me were talking about work queues.

...

res.get() would be a fairly obvious way to do it. Enough that I would probably never have gone searching for any of the other answers. Though personally, I think I would call it "set.peek()", but the specific name doesn't really matter to me.

Somebody proposed to call it .any(); this I like best (except that one might expect that any takes a predicate as the argument). Regards, Martin

Stephen J. Turnbull

7:48 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

"Martin v. Löwis" writes:

...

...
res.get() would be a fairly obvious way to do it. Enough that I would probably never have gone searching for any of the other answers. Though personally, I think I would call it "set.peek()", but the specific name doesn't really matter to me.

Somebody proposed to call it .any(); this I like best (except that one might expect that any takes a predicate as the argument).

Why not allow that? def any(self, predicate=lambda x: True, default=None) for a in self: if predicate(a): break else: return default return a

"Martin v. Löwis"

8:02 a.m.

New subject: Retrieve an arbitrary element from a set without removing it

...

Why not allow that?

def any(self, predicate=lambda x: True, default=None) for a in self: if predicate(a): break else: return default return a

I'm +0 (given that I'm still skeptical about the need to have something like this). Also setting aside the moratorium here, which may disallow that kind of change for a foreseeable feature in any form. Regards, Martin

Scott David Daniels

6:14 a.m.

John Arbash Meinel wrote:

...

res = heads(node1, node2) if len(res) == 1: # What is the 'obvious' way to get the node out?

I posit that there *isn't* an obvious way to get the single item out of a 1-entry frozenset.

for x in res: break list(res)[0] set(res).pop() iter(res).next() [x for x in res][0] x, = res # I didn't think of this one before recently

Are all answers, but none of them I would consider *obvious*. And from my SQL-hacking experience:

x = min(s) --Scott David Daniels Scott.Daniels@Acm.Org

Stefan Behnel

7:01 p.m.

Willi Richert wrote:

...

recently I wrote an algorithm, in which very often I had to get an arbitrary element from a set without removing it.

See this discussion: http://comments.gmane.org/gmane.comp.python.ideas/5606 Stefan

5583

Age (days ago)

5601

Last active (days ago)

List overview

Download

151 comments

39 participants

participants (39)

"Martin v. Löwis"
A.M. Kuchling
Adam Olsen
Alex Martelli
Alexander Belopolsky
Antoine Pitrou
average
Ben Finney
Benjamin Peterson
Björn Lindqvist
Cameron Simpson
Chris Bergstresser
Daniel Stutzbach
Eric Smith
Florian Weimer
Georg Brandl
geremy condra
Greg Ewing
Guido van Rossum
James Y Knight
Jesse Noller
John Arbash Meinel
John Arbash Meinel
John Arbash Meinel
Mark Dickinson
Martin (gzlist)
Nick Coghlan
Oleg Broytman
Paul Moore
Raymond Hettinger
Scott David Daniels
ssteinerX＠gmail.com
Stefan Behnel
Stephen J. Turnbull
Steven D'Aprano
Terry Reedy
Vitor Bosshard
Willi Richert
Yuvgoog Greenle