Mailman 3 get method for sets? - Python-ideas

get method for sets?

older
Add `future_builtins` as an alias...

Mike Meyer

May 16, 2012

6:32 a.m.

Is there some reason that there isn't a straightforward way to get an element from a set without removing it? Everything I find either requires multiple statements or converting the set to another data type. It seems that some kind of get method would be useful. The argument that "getting an arbitrary element from a set isn't useful" is refuted by 1) the existence of the pop method, which does just that, and 2) the fact that I (and a number of other people) have run into such a need. My search for such a reason kept finding people asking how to get an element instead. Of course, my key words (set and get) are heavily overloaded. thanks, <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Show replies by date

Steven D'Aprano

May 2012

6:58 a.m.

On Wed, May 16, 2012 at 02:32:15AM -0400, Mike Meyer wrote:

...

pop returns an arbitrary element, and removes it. That's a very different operation to "get this element from the set". The problem is, if there was a set.get(x) method, you have to pass x as argument, and it returns, what? x. So what's the point? You already have the return value before you call the function. def get(s, x): """Return element x from set s.""" if x in s: return x raise KeyError('not found') As I see it, this is only remotely useful if: - you care about identity, e.g. caching/interning - you care about types, e.g. get(s, 42) may return 42.0 as the element of the set instead. In either case, a dict is the more obvious data structure to use. def intern(d, x): """Intern element x in dict d, and return the interned version.""" return d.setdefault(x, x) But with sets? Seems pretty pointless to me. I can't help but feel that set.get() is a poorly thought out operation, much requested but rarely useful.

...

What's your use-case? -- Steven

Dirkjan Ochtman

7:01 a.m.

On Wed, May 16, 2012 at 8:58 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I understood Mike's message to be proposing an argument-less .get() method. Cheers, Dirkjan

Mike Meyer

7:10 a.m.

On Wed, 16 May 2012 16:58:43 +1000 Steven D'Aprano <steve@pearwood.info> wrote:

...

I guess I should have been explicit about what I'm was asking about. I'm not asking for set.get(x) that returns "this element", I'm asking for set.get() that returns an arbitrary element, like set.pop(), but without removing it. It doesn't even need to be the same element that set.pop() would return. The name is probably a poor choice, but I'm not sure what else it should be. pop_without_remove seems a bit verbose, and implies that it might return the element a pop would. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Steven D'Aprano

7:26 a.m.

On Wed, May 16, 2012 at 03:10:35AM -0400, Mike Meyer wrote:

...

I guess I should have been explicit about what I'm was asking about.

...

I'm not asking for set.get(x) that returns "this element", I'm asking for set.get() that returns an arbitrary element, like set.pop(), but without removing it. It doesn't even need to be the same element that set.pop() would return.

Could this helper function not do the job? def get(s): x = s.pop() s.add(x) return x Of course, this does not guarantee that repeated calls to get() won't return the same result over and over again. If that's unacceptable, you'll need to specify what behaviour is acceptable -- i.e. what your functional requirements are. E.g. "I need the element to be selected at random." "I don't need randomness, returning the elements in some arbitrary but deterministic order will do, with no repeats or cycles." "I don't care whether or not there are repeats, so long as the same element is not returned twice in a row." "Once I've seen every element, I expect get() to raise an exception." etc. And I guarantee that whatever your requirements are, other people will want something different. Once you have your requirements, you can start thinking about implementation (e.g. how does the set remember which elements have already been get'ed?).

...

The name is probably a poor choice, but I'm not sure what else it should be. pop_without_remove seems a bit verbose, and implies that it might return the element a pop would.

Are you suggesting that get() and pop() should not return the same element? -- Steven

Devin Jeanpierre

7:38 a.m.

On Wed, May 16, 2012 at 3:26 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Are you suggesting that get() and pop() should not return the same element?

He is suggesting that "It doesn't even need to be the same element that set.pop() would return." -- Devin

Mike Meyer

7:52 a.m.

On Wed, 16 May 2012 17:26:45 +1000 Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, May 16, 2012 at 03:10:35AM -0400, Mike Meyer wrote:

...
I guess I should have been explicit about what I'm was asking about.

:)

...
I'm not asking for set.get(x) that returns "this element", I'm asking for set.get() that returns an arbitrary element, like set.pop(), but without removing it. It doesn't even need to be the same element that set.pop() would return.

Could this helper function not do the job?

def get(s): x = s.pop() s.add(x) return x

Sure, if you don't mind munging the set unnecessarily. That's more readable, but slower and longer than: def get(s): for x in is: return s

...

Of course, this does not guarantee that repeated calls to get() won't return the same result over and over again. If that's unacceptable, you'll need to specify what behaviour is acceptable -- i.e. what your functional requirements are. E.g. "I need the element to be selected at random." "I don't need randomness, returning the elements in some arbitrary but deterministic order will do, with no repeats or cycles." "I don't care whether or not there are repeats, so long as the same element is not returned twice in a row." "Once I've seen every element, I expect get() to raise an exception." etc.

My requirements are "I need an element from the set". The behavior of repeated calls is immaterial.

...

And I guarantee that whatever your requirements are, other people will want something different.

That's not what I found in my google results. They were all pretty much asking for what I was asking for, and didn't care what happened beyond the first call. I believe you're assuming that the purpose of this method is to start an iteration through the set. That's not the case at all, and a single call to pop would be perfectly acceptable, except I would then need to put the element back. If that were the purpose, I'd agree with you - between iteration and pop, we've covered most of the ways you might want to iterate through the elements. But the point isn't to iterate through the elements, it's to examine a single element.

...

Once you have your requirements, you can start thinking about implementation (e.g. how does the set remember which elements have already been get'ed?).

Doesn't need to, because it doesn't matter.

...

...
The name is probably a poor choice, but I'm not sure what else it should be. pop_without_remove seems a bit verbose, and implies that it might return the element a pop would. Are you suggesting that get() and pop() should not return the same element?

I'm suggesting there is no requirement that they return the same element. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Stephen J. Turnbull

8:42 a.m.

Mike Meyer writes:

...

On Wed, 16 May 2012 17:26:45 +1000

...

Why would you mind munging the set temporarily? Why is speed (of something that almost by definition is undefined if repeated) important? Your example use case of testing doesn't motivate these parts of your requirements. I'm -1 on adding a method that has no motivation in production that I can see. Just redefine your get() function as a function, with a more appropriate name such as "get_item_nondeterministically". It will work on any iterable. (Don't forget to document that it will "use up" an item if the iterable is not a sequence, though.)

Cameron Simpson

12:56 a.m.

On 16May2012 17:42, Stephen J. Turnbull <stephen@xemacs.org> wrote: | Mike Meyer writes: | > On Wed, 16 May 2012 17:26:45 +1000 | | > > Could this helper function not do the job? | > > | > > def get(s): | > > x = s.pop() | > > s.add(x) | > > return x | > | > Sure, if you don't mind munging the set unnecessarily. That's more | > readable, but slower and longer than: | > | > def get(s): | > for x in s: | > return s I was about to suggest Mike's implementation. | Why would you mind munging the set temporarily? Personally, I work with multiple threads quite often. Therefore I habitually avoid data structure modifying operations unless they're neccessary. Any time I modify a data structure is a time I have to worry about shared access. | Why is speed (of | something that almost by definition is undefined if repeated) | important? Besides, modifying a data structure _is_ slow than just looking, usually. There may even be garbage collection:-( | I'm -1 on adding a method that has no motivation in production that I | can see. Just redefine your get() function as a function, with a more | appropriate name such as "get_item_nondeterministically". It will | work on any iterable. (Don't forget to document that it will "use up" | an item if the iterable is not a sequence, though.) Yah: def an(s): for i in s: return i I'm also -1 on a set _method_, though he can always subclass and add his own for his use case. Cheers, -- Cameron Simpson <cs@zip.com.au> DoD#743 http://www.cskk.ezoshosting.com/cs/ If your new theorem can be stated with great simplicity, then there will exist a pathological exception. - Adrian Mathesis

Paul Du Bois

2:19 a.m.

...

On Wed, May 16, 2012 at 5:56 PM, Cameron Simpson <cs@zip.com.au> wrote:

...

Normally I'm content to lurk, but this thread has been going on for a long time without anyone pointing out that the "for" loop idiom needs an "else: raise KeyError" in order to act pythonically. p

Greg Ewing

3:03 a.m.

On 17/05/12 14:19, Paul Du Bois wrote:

...

That depends on what result you want in the empty set case. If returning None is okay, or you know the set can never be empty, then it's fine as written. -- Greg

Mike Meyer

3:49 a.m.

On Thu, 17 May 2012 15:03:11 +1200 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Raising KeyError is probably best, as that parallels "pop". In fact, it would be required for the proposed "peek" method that returns what "pop" would have returned at that point. That method would not only have satisfied all 2.5 of the use cases I had, but would probably be useful for algorithms that want a conditional pop. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Bruce Leban

7:02 a.m.

On Tue, May 15, 2012 at 11:32 PM, Mike Meyer <mwm@mired.org> wrote:

...

Your request needs clarification. What does set.get do? What is the actual use case? I understand what pop does: it removes and returns an arbitrary member of the set. Therefore, if I call pop repeatedly, I eventually get all the members. That's useful. Here's one definition of get: def get_from_set1(s): """Return an arbitrary member of a set.""" return min(s, key=hash) How is this useful? Or do you mean instead: checks to see if an element is in the set and returns it otherwise returns a default value def get_from_set2(s, v, d=None): """Returns v if v is in the set, otherwise returns d.""" return v if v in s else d I suppose this could be useful but it's a one liner and seems much less obvious what it does than dict.get. Or did you mean something else? --- Bruce

Mike Meyer

7:40 a.m.

On Wed, 16 May 2012 00:02:31 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...

On Tue, May 15, 2012 at 11:32 PM, Mike Meyer <mwm@mired.org> wrote:

...
Is there some reason that there isn't a straightforward way to get an element from a set without removing it? Everything I find either requires multiple statements or converting the set to another data type.

It seems that some kind of get method would be useful. The argument that "getting an arbitrary element from a set isn't useful" is refuted by 1) the existence of the pop method, which does just that, and 2) the fact that I (and a number of other people) have run into such a need. Your request needs clarification. What does set.get do? What is the actual use case? I understand what pop does: it removes and returns an arbitrary member of the set. Therefore, if I call pop repeatedly, I eventually get all the members. That's useful.

So is just getting a single member:

...

Here's one definition of get: def get_from_set1(s): """Return an arbitrary member of a set.""" return min(s, key=hash)

From poking around, at least at one time the fastest implementation was the very confusing: def get_from_set(s): for x in s: return x

...

How is this useful?

Basically, anytime you want to examine an arbitrary element of a set, and would use pop, except you need to preserve the set for future use. In my case, I'm running a series of tests on the set, and some tests need an element. Again, looking for a reason for this not existing turned up other cases where people were wondering how to do this. Hmm. Maybe the name should be item? <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Bruce Leban

8:08 a.m.

On Wed, May 16, 2012 at 12:40 AM, Mike Meyer <mwm@mired.org> wrote:

...

On Wed, 16 May 2012 00:02:31 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...

...
Here's one definition of get: def get_from_set1(s): """Return an arbitrary member of a set.""" return min(s, key=hash)

...
From poking around, at least at one time the fastest implementation was the very confusing:

def get_from_set(s): for x in s: return x

...

How is this useful?

Basically, anytime you want to examine an arbitrary element of a set, and would use pop, except you need to preserve the set for future use. In my case, I'm running a series of tests on the set, and some tests need an element.

That's bordering on tautological. It's useful anytime you need it. I don't

I didn't claim it was fast. I actually wrote that version instead of the in/return version for a very specific reason: it always returns the same element. (The for/in/return version might return the same element every time too but it's not guaranteed.) think your test is very good if it uses the get I wrote above. Your test will only operate on one element of the set and it's easy to write functions which succeed for some elements of the set and fail for others. I'd like to see an actual test that you think needs this that would not be improved by iterating over the list.

...

Again, looking for a reason for this not existing turned up other cases where people were wondering how to do this.

Things are added to APIs and libraries because they are useful, not because people wonder why they aren't there. set.get as you propose is not sufficiently analogous to dict.get or list.__getitem__. --- Bruce Follow me: http://www.twitter.com/Vroo http://www.vroospeak.com

Mike Meyer

8:46 a.m.

On Wed, 16 May 2012 01:08:09 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...

I didn't ask for a get that would always return the same element.

...

What do you expect? We've got a container type that has no way to examine an element without modifying the container or wrapping it in an object of a different another type. The general use case is that you don't need the facilities of the wrapping type (except for their ability to provide a single element) and you don't want to modify the container. Would you also complain that having int accept a string value in lieu of using eval on untrusted input is a case of "it's useful anytime you need it."?

...

Talk about tautologies! Of course you can write tests that will fail in some cases. You can also write tests that won't fail for your cases. Especially if you know something about the set beforehand. For instance, I happen to know I have a set of ElementTree elements that all have the same tag. I want to check the tag. One of the test cases starts by checking to see if the set is a singleton. Do you really propose something like: if len(s) == 1: for i in s: res = process(i) or: if len(s) == 1: res = process(list(s)[0]) Or, as suggested elsewhere: if len(s) == 1: res = process(next(iter(s))) Or, just to be really obtuse: if len(s) == 1: res = process(set(s).pop()) All of these require creating an intermediate object for the sole purpose of getting an item out of the container without destroying the container. This leads the reader to wonder why it was created, which clashes with pretty much everything else in python. The only really palatable version is sufficiently obscure that nobody else who needed this API found it, or had it suggested to them by those responding to their question.

...

People wonder why an API is not there because they need it and can't find it. I have a use for this API, so clearly it's useful. When I didn't find it, I figured there might be a reason for it not existing, so I asked the google djinn. Rather than providing a reason (djinn being noticeably untrustworthy), it turned up other people who had the same need as I did. I then asked this list the same question. Instead of getting an answer to that question, I get a bunch of claims that "I don't really need that", making me wonder if I've invoked another djinn. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Stephen J. Turnbull

9:23 a.m.

Mike Meyer writes:

...

Indeed, you asked whether there's a reason it's not in the stdlib, and the bunch of answers you got is the stdlib really doesn't need it, so it's not there. Maybe some of them were rash enough to also claim that *you* don't really need it, but that's sort of off topic in this thread---just ignore them and use the recipe you find most useful!<wink/>

Bruce Leban

3:06 p.m.

On Wed, May 16, 2012 at 1:46 AM, Mike Meyer <mwm@mired.org> wrote:

...

On Wed, 16 May 2012 01:08:09 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...
On Wed, May 16, 2012 at 12:40 AM, Mike Meyer <mwm@mired.org> wrote:

I didn't claim it was fast. I actually wrote that version instead of the

...
in/return version for a very specific reason: it always returns the same element. (The for/in/return version might return the same element every time too but it's not guaranteed.)

I didn't ask for a get that would always return the same element.

You didn't ask for a get that *didn't* always return the same element. My deterministic version is totally compatible with your ask here and

...

Would you also complain that having int accept a string value in lieu of using eval on untrusted input is a case of "it's useful anytime you need it."?

Not at all. It's useful because it's very common to need to convert strings to numbers and I can show you lots of code that does just that. So we need a method that does that safely. Does it have to be int? No; it could be atoi or parse_int or scanf. But we do need it.

...

...
I don't think your test is very good if it uses the get I wrote above. Your test will only operate on one element of the set and it's easy to write functions which succeed for some elements of the set and fail for others. I'd like to see an actual test that you think needs this that would not be improved by iterating over the list.

Talk about tautologies! Of course you can write tests that will fail in some cases. You can also write tests that won't fail for your cases. Especially if you know something about the set beforehand.

Not what I said. It's easy to write a *function* that fails on some elements and your *test* won't test it. Example: a function that fails when operating on non-integer set elements or the largest element or .... Your test only tests an one case.

...

For instance, I happen to know I have a set of ElementTree elements that all have the same tag. I want to check the tag.

Then maybe you should be using a different data structure than a set. Maybe set_with_same_tag that declares that constraint and can enforce the constraint if you want.

...

One of the test cases starts by checking to see if the set is a singleton. Do you really propose something like:

if len(s) == 1: for i in s: res = process(i)

This is a legitimate use case. I don't think it's a big deal to have to add a one line function to your code. I might even use EAFP:

res = process(set_singleton(s)) where def set_singleton(s): [result] = s return result --- Bruce

Greg Ewing

11:51 p.m.

Bruce Leban wrote:

...

Another class of use cases is where you know that the set contains only one element, and you want to find out what that element is. I encountered one of these in my recent PyWeek game entry. I have a set of selected units, and commands that can be applied to them. Some commands can only be used on a single unit at a time, so there are places in the code where there can only be one element in the set. Using a separate SetWithOnlyOneElement type in that case would be tedious and unnecessary. I don't need to enforce the constraint; I know it's satisfied because I wouldn't have ended up at that point in the code if it wasn't. -- Greg

Steven D'Aprano

4:28 p.m.

Mike Meyer wrote:

...

I didn't ask for a get that would always return the same element.

It seems to me that you haven't exactly been clear about what you want this get() method to actually do. In an earlier email, you said: [quote] My requirements are "I need an element from the set". The behavior of repeated calls is immaterial. [end quote] So a get() which always returns the same element fits your requirements, *as stated*. If you have other requirements, you haven't been forthcoming with them. I've never come across a set implementation which includes something like your get() method. Does anyone know any language whose set implementation has this functionality? Wikipedia currently suggests it isn't a natural method of sets: Unlike most other collection types, rather than retrieving a specific element from a set, one typically tests a value for membership in a set. http://en.wikipedia.org/wiki/Set_%28computer_science%29 For all you say it is a common request, I don't think it's a well-thought-out request. It's one thing to ask "give me any element without modifying the set", but what does that mean exactly? Which element should it return? "Any element, so long as it isn't always the same element twice in a row" perhaps? Would flip-flopping between the first and second elements meet your requirements? The example you give below:

...

For instance, I happen to know I have a set of ElementTree elements that all have the same tag. I want to check the tag.

One of the test cases starts by checking to see if the set is a singleton. Do you really propose something like:

is too much of a special case to really matter. A set with one item avoids all the hard questions, since there is only one item which could be picked. It's the sets with two or more items that are hard. A general case get/pick method has to deal with the hard cases, not just the easy one-element cases. [...]

...

All of these require creating an intermediate object for the sole purpose of getting an item out of the container without destroying the container. This leads the reader to wonder why it was created,

You've just explained why it was created -- to get an item out of the set without destroying it. Why is this a problem? We do something similar frequently, often abstracted away inside a helper function: first = list(iterable)[0] num_digits = len(str(some_integer)) etc. -- Steven

Greg Ewing

1:51 a.m.

On 17/05/12 04:28, Steven D'Aprano wrote:

...

It might be useful to have a method specified as returning the same element that a subsequent pop() would return. Then it could be used as a look-ahead for an algorithm involving a pop-loop, or for anything with more liberal requirements. -- Greg

Nick Coghlan

8:08 a.m.

On Wed, May 16, 2012 at 5:40 PM, Mike Meyer <mwm@mired.org> wrote:

...

Why is this confusing? The operation you want to perform is "give me an object from this set, I don't care which one". That's not an operation that applies just to sets, you can do it with an iterable, therefore the spelling is one that works with any iterable: next(iter(s)) This entire thread is like asking for s.length() when len(s) already works. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mike Graham

2:44 p.m.

On Wed, May 16, 2012 at 4:08 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

It sounds like you're re-implementing the venerable ,= operator. ,= is one of my favorite operators in Python. You know,

...

;^), Mike

Masklinn

2:51 p.m.

On 2012-05-16, at 16:44 , Mike Graham wrote:

...

With the difference that ,= also asserts there is only one item in the iterable, where `next . iter` only does `head`. (but the formatting as a single operator is genius, I usually write it as `item, = s` and the lack of clarity bothers me, thanks for that visual trick)

Mike Graham

2:57 p.m.

On Wed, May 16, 2012 at 10:51 AM, Masklinn <masklinn@masklinn.net> wrote:

...

1. I should have quoted Mike Meyer's code "if len(s) == 1: res = process(next(iter(s)))", which is what I had in mind. (In that case, it's a feature. :) ) 2. I grouped it this way as a joke. If you do this, everyone will think you're crazy. I've been known to write it (item,) = s, which makes it a little easier to see the comma. If you want to be unambiguous AND confuse everyone, go with

...

Mike

Masklinn

4:43 p.m.

On 2012-05-16, at 16:57 , Mike Graham wrote:

...

2. I grouped it this way as a joke. If you do this, everyone will think you're crazy.

I don't mind, it expresses the intent clearly and looks *weird* at first glance which is fine by me: colleagues & readers are unlikely to miss it if they don't know the idiom. I genuinely like it.

Yury Selivanov

8:10 p.m.

On 2012-05-16, at 10:51 AM, Masklinn wrote:

...

With the difference that ,= also asserts there is only one item in the iterable, where `next . iter` only does `head`.

For that, the ,*_= operator exists ;)

...

I hope I'll never encounter this, though. - Yury

Nick Coghlan

7:11 a.m.

On Wed, May 16, 2012 at 4:32 PM, Mike Meyer <mwm@mired.org> wrote:

...

The two primary use cases handled by the current interface are: 1. Do something for all items in the set (iteration) 2. Do something for an arbitrary item in the set, and keep track of which items remain (set.pop) Now, at the iterator level, it is possible to turn "do something for all items in an iterable" to "do something for the *first* item in the iterable" via "next(iter(obj))". Since this use case is already covered by the iterator protocol, the question then becomes: Is there a specific reason a dedicated set-specific solution is needed rather than better educating people that "the first item" is an acceptable answer when the request is for "an arbitrary item" (this is particularly true in a world where set ordering is randomised by default)? Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

Mike Meyer

8:09 a.m.

On Wed, 16 May 2012 17:11:28 +1000
Nick Coghlan <ncoghlan@gmail.com> wrote:

> On Wed, May 16, 2012 at 4:32 PM, Mike Meyer <mwm@mired.org> wrote:
> > Is there some reason that there isn't a straightforward way to get an
> > element from a set without removing it? Everything I find either
> > requires multiple statements or converting the set to another data
> > type.
> > It seems that some kind of get method would be useful. The argument
> > that "getting an arbitrary element from a set isn't useful" is refuted
> > by 1) the existence of the pop method, which does just that, and 2)
> > the fact that I (and a number of other people) have run into such a
> > need.
> > My search for such a reason kept finding people asking how
> > to get an element instead. Of course, my key words (set and get) are
> > heavily overloaded.
> The two primary use cases handled by the current interface are:
> 1. Do something for all items in the set (iteration)
> 2. Do something for an arbitrary item in the set, and keep track of
> which items remain (set.pop)

Neither of which fits my use case.

> Since this use case is already covered by the iterator protocol, the
> question then becomes: Is there a specific reason a dedicated
> set-specific solution is needed rather than better educating people
> that "the first item" is an acceptable answer when the request is for
> "an arbitrary item" (this is particularly true in a world where set
> ordering is randomised by default)?

Because next(iter(s)) makes the reader wonder "Why is this iterator
being created?" It's a less expensive form of writing list(s)[0]. It's
also sufficiently non-obvious that the closest I found on google for a
discussion of the issue was the "for x in s: break" variant. Which
makes me think that at the very least, this idiom ought to be
mentioned in the documentation. Or if it's already there, then a
pointer added to the set documentation.

But my question was actually whether or not there was a reason for it
not existing. Has there been a previous discussion of this?

   <mike
-- 
Mike Meyer <mwm@mired.org>		http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Paul Moore

8:34 a.m.

On 16 May 2012 09:09, Mike Meyer <mwm@mired.org> wrote:

...

I guess a doc patch adding a comment in the documentation of set.pop that if you want an arbitrary element of a set *without* removing it, then next(iter(s)) will give it to you, would be reasonable. Maybe you could write one? But I don't think it's particularly difficult to understand. It's very Python-specific, sure, but it feels idiomatic to me. Paul.

Steven D'Aprano

4:20 p.m.

Mike Meyer wrote:

...

But my question was actually whether or not there was a reason for it not existing. Has there been a previous discussion of this?

Aye yai yai, have there ever. http://mail.python.org/pipermail/python-bugs-list/2005-August/030069.html If you have an hour or two spare, read this thread: http://mail.python.org/pipermail/python-dev/2009-October/093227.html By the way, I suggest that a better name than "get" is pick(), which once was (but no longer is) suggested by Wikipedia as a fundamental set operation. http://en.wikipedia.org/w/index.php?title=Set_%28abstract_data_type%29&oldid=461872038#Static_sets It seems to me that it has been removed because: - the actual semantics of what it means to get/pick a value from a set are unclear; and - few, if any, set implementations actually provide this method. I still think your best bet is a helper function: def pick(s): return next(iter(s)) -- Steven

Chris Rebert

6:19 p.m.

On Wed, May 16, 2012 at 9:20 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Objective-C's NSSet calls it "anyObject" and doesn't specify much about it (in particular, its behavior when called repeatedly), mainly just that "the selection is not guaranteed to be random". I haven't poked around to see how it actually behaves in practice. C#'s ISet has First() and Last(), but merely as extension methods. Java, Ruby, and Haskell don't seem to include any such operation in their generic set interfaces. Cheers, Chris -- http://rebertia.com

Masklinn

7:21 p.m.

On 2012-05-16, at 20:19 , Chris Rebert wrote:

...

It takes the first object it finds which is pretty much solely a property of how it stores its items, its behavior translated into Python code is precisely: next(iter(set)) or list(set)[0] I just tested creating a few thousand sets and filling them with random integer values (using arc4random(3)) and never once did [set anyObject] differ from [[set allObjects] objectAtIndex:0] or from [[set objectEnumerator] nextValue].

Mathias Panzenböck

8:16 p.m.

On 05/16/2012 08:19 PM, Chris Rebert wrote:

...

Ruby's Set has first() but not last(). That saied I'm -1 on a get/pick method for a set.

...

Ethan Furman

11:08 p.m.

Steven D'Aprano wrote:

...

Don't forget the doc string! "returns an arbitrary element from set s" ~Ethan~

Greg Ewing

11:56 p.m.

Steven D'Aprano wrote:

...

By the way, I suggest that a better name than "get" is pick()

I was going to suggest peek(), which is more suggestive of a non-modifying function. -- Greg

Terry Reedy

10:41 p.m.

On 5/16/2012 4:09 AM, Mike Meyer wrote:

...

Because next(iter(s)) makes the reader wonder "Why is this iterator being created?"

If s is a non-iterator iterable, to get at the contents of s non-destructively.

...

makes me think that at the very least, this idiom ought to be mentioned in the documentation.

http://bugs.python.org/issue14836

...

But my question was actually whether or not there was a reason for it not existing.

Because not all simple compositions need to be added to the sdtlib and builtins. -- Terry Jan Reedy

Ben Finney

7:39 a.m.

Mike Meyer <mwm@mired.org> writes:

...

With a mapping, you use a key to get an item. With a sequence, you have an index and get an item. Sets are unordered collections of items without indices or keys. What does it mean to you to “get” an item from that? If you mean “get the items one by one”, a set is an iterable:: for item in foo_set: do_something_with(item) If you mean “test whether an item is in the set”, the ‘in’ operator works:: if item in foo_set: do_something() If you mean “get a specific item from a set”, the only way to do that is to *already have* the specific item and test whether it's in the set.

...

If by “get” you mean to get an *arbitrary* item, not a specific item, then what's the problem? You already have ‘set.pop’, as you point out. What need do you have that isn't being fulfilled by the existing mthods and operators? Can you show some actual code that would be improved by a ‘get’ operation on sets? -- \ “It's a terrible paradox that most charities are driven by | `\ religious belief.… if you think altruism without Jesus is not | _o__) altruism, then you're a dick.” —Tim Minchin, 2010-11-28 | Ben Finney

Mike Meyer

8:12 a.m.

On Wed, 16 May 2012 17:39:10 +1000 Ben Finney <ben+python@benfinney.id.au> wrote:

...

If by “get” you mean to get an *arbitrary* item, not a specific item, then what's the problem? You already have ‘set.pop’, as you point out.

And, as I also pointed out, it's not useful in the case where you need to preserve the set for future use. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Ben Finney

3:43 p.m.

Mike Meyer <mwm@mired.org> writes:

...

Then, if ‘item = next(iter(foo_set))’ doesn't suit you, perhaps you'd like ‘item = set(foo_set).pop()’. Regardless, I think you have your answer: Like most things that can already be done by composing the existing pieces, this corner case hasn't met the deliberately-high bar for making a special method just to do it. I still haven't seen you describe the use case where the existing ways of doing this aren't good enough. -- \ “Men never do evil so completely and cheerfully as when they do | `\ it from religious conviction.” —Blaise Pascal (1623–1662), | _o__) Pensées, #894. | Ben Finney

Mike Meyer

8 p.m.

On Thu, 17 May 2012 01:43:06 +1000 Ben Finney <ben+python@benfinney.id.au> wrote:

...

...
On Wed, 16 May 2012 17:39:10 +1000 Ben Finney <ben+python@benfinney.id.au> wrote:

...
If by “get” you mean to get an *arbitrary* item, not a specific item, then what's the problem? You already have ‘set.pop’, as you point out. And, as I also pointed out, it's not useful in the case where you need to preserve the set for future use. Then, if ‘item = next(iter(foo_set))’ doesn't suit you, perhaps you'd

Mike Meyer <mwm@mired.org> writes: like ‘item = set(foo_set).pop()’.

This is precisely what bugs me about this case. There's not one obvious way to do it. There's a collection of ways that are all in some ways/cases problematical. They all involve creating a scratch object from the set and using it's API to (possibly destructively) get the one value that's wanted. In a way, it reminds me of the discussions that eventually led to the if else expression being added.

...

Regardless, I think you have your answer: Like most things that can already be done by composing the existing pieces, this corner case hasn't met the deliberately-high bar for making a special method just to do it.

And it's already been discussed to death. *That's* what I was trying to find out. If it hadn't been, I'd have put together a serious proposal.

...

I still haven't seen you describe the use case where the existing ways of doing this aren't good enough.

That, of course, is a subjective judgment. <mike -- Mike Meyer <mwm@mired.org> http://www.mired.org/ Independent Software developer/SCM consultant, email for more information. O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Steven D'Aprano

May 2012

1:58 a.m.

On Wed, May 16, 2012 at 02:32:15AM -0400, Mike Meyer wrote:

...

What's your use-case? -- Steven

Dirkjan Ochtman

2:01 a.m.

On Wed, May 16, 2012 at 8:58 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

I understood Mike's message to be proposing an argument-less .get() method. Cheers, Dirkjan

Mike Meyer

2:10 a.m.

On Wed, 16 May 2012 16:58:43 +1000 Steven D'Aprano <steve@pearwood.info> wrote:

...

Steven D'Aprano

2:26 a.m.

On Wed, May 16, 2012 at 03:10:35AM -0400, Mike Meyer wrote:

...

I guess I should have been explicit about what I'm was asking about.

...

I'm not asking for set.get(x) that returns "this element", I'm asking for set.get() that returns an arbitrary element, like set.pop(), but without removing it. It doesn't even need to be the same element that set.pop() would return.

...

The name is probably a poor choice, but I'm not sure what else it should be. pop_without_remove seems a bit verbose, and implies that it might return the element a pop would.

Are you suggesting that get() and pop() should not return the same element? -- Steven

Devin Jeanpierre

2:38 a.m.

On Wed, May 16, 2012 at 3:26 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Are you suggesting that get() and pop() should not return the same element?

He is suggesting that "It doesn't even need to be the same element that set.pop() would return." -- Devin

Mike Meyer

2:52 a.m.

On Wed, 16 May 2012 17:26:45 +1000 Steven D'Aprano <steve@pearwood.info> wrote:

...

On Wed, May 16, 2012 at 03:10:35AM -0400, Mike Meyer wrote:

...
I guess I should have been explicit about what I'm was asking about.

:)

...
I'm not asking for set.get(x) that returns "this element", I'm asking for set.get() that returns an arbitrary element, like set.pop(), but without removing it. It doesn't even need to be the same element that set.pop() would return.

Could this helper function not do the job?

def get(s): x = s.pop() s.add(x) return x

Sure, if you don't mind munging the set unnecessarily. That's more readable, but slower and longer than: def get(s): for x in is: return s

...

Of course, this does not guarantee that repeated calls to get() won't return the same result over and over again. If that's unacceptable, you'll need to specify what behaviour is acceptable -- i.e. what your functional requirements are. E.g. "I need the element to be selected at random." "I don't need randomness, returning the elements in some arbitrary but deterministic order will do, with no repeats or cycles." "I don't care whether or not there are repeats, so long as the same element is not returned twice in a row." "Once I've seen every element, I expect get() to raise an exception." etc.

My requirements are "I need an element from the set". The behavior of repeated calls is immaterial.

...

And I guarantee that whatever your requirements are, other people will want something different.

...

Once you have your requirements, you can start thinking about implementation (e.g. how does the set remember which elements have already been get'ed?).

Doesn't need to, because it doesn't matter.

...

...
The name is probably a poor choice, but I'm not sure what else it should be. pop_without_remove seems a bit verbose, and implies that it might return the element a pop would. Are you suggesting that get() and pop() should not return the same element?

Stephen J. Turnbull

May 2012

8:42 a.m.

Mike Meyer writes:

...

On Wed, 16 May 2012 17:26:45 +1000

...

Cameron Simpson

12:56 a.m.

Paul Du Bois

2:19 a.m.

...

On Wed, May 16, 2012 at 5:56 PM, Cameron Simpson <cs@zip.com.au> wrote:

...

Normally I'm content to lurk, but this thread has been going on for a long time without anyone pointing out that the "for" loop idiom needs an "else: raise KeyError" in order to act pythonically. p

Greg Ewing

3:03 a.m.

On 17/05/12 14:19, Paul Du Bois wrote:

...

That depends on what result you want in the empty set case. If returning None is okay, or you know the set can never be empty, then it's fine as written. -- Greg

Mike Meyer

3:49 a.m.

On Thu, 17 May 2012 15:03:11 +1200 Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:

...

Bruce Leban

7:02 a.m.

On Tue, May 15, 2012 at 11:32 PM, Mike Meyer <mwm@mired.org> wrote:

...

Mike Meyer

May 2012

7:40 a.m.

On Wed, 16 May 2012 00:02:31 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...

On Tue, May 15, 2012 at 11:32 PM, Mike Meyer <mwm@mired.org> wrote:

...
Is there some reason that there isn't a straightforward way to get an element from a set without removing it? Everything I find either requires multiple statements or converting the set to another data type.

It seems that some kind of get method would be useful. The argument that "getting an arbitrary element from a set isn't useful" is refuted by 1) the existence of the pop method, which does just that, and 2) the fact that I (and a number of other people) have run into such a need. Your request needs clarification. What does set.get do? What is the actual use case? I understand what pop does: it removes and returns an arbitrary member of the set. Therefore, if I call pop repeatedly, I eventually get all the members. That's useful.

So is just getting a single member:

...

Here's one definition of get: def get_from_set1(s): """Return an arbitrary member of a set.""" return min(s, key=hash)

From poking around, at least at one time the fastest implementation was the very confusing: def get_from_set(s): for x in s: return x

...

How is this useful?

Bruce Leban

8:08 a.m.

On Wed, May 16, 2012 at 12:40 AM, Mike Meyer <mwm@mired.org> wrote:

...

On Wed, 16 May 2012 00:02:31 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...

...
Here's one definition of get: def get_from_set1(s): """Return an arbitrary member of a set.""" return min(s, key=hash)

...
From poking around, at least at one time the fastest implementation was the very confusing:

def get_from_set(s): for x in s: return x

...

How is this useful?

Basically, anytime you want to examine an arbitrary element of a set, and would use pop, except you need to preserve the set for future use. In my case, I'm running a series of tests on the set, and some tests need an element.

That's bordering on tautological. It's useful anytime you need it. I don't

...

Again, looking for a reason for this not existing turned up other cases where people were wondering how to do this.

Mike Meyer

8:46 a.m.

On Wed, 16 May 2012 01:08:09 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...

I didn't ask for a get that would always return the same element.

...

Stephen J. Turnbull

9:23 a.m.

Mike Meyer writes:

...

Bruce Leban

3:06 p.m.

On Wed, May 16, 2012 at 1:46 AM, Mike Meyer <mwm@mired.org> wrote:

...

On Wed, 16 May 2012 01:08:09 -0700 Bruce Leban <bruce@leapyear.org> wrote:

...
On Wed, May 16, 2012 at 12:40 AM, Mike Meyer <mwm@mired.org> wrote:

I didn't claim it was fast. I actually wrote that version instead of the

...
in/return version for a very specific reason: it always returns the same element. (The for/in/return version might return the same element every time too but it's not guaranteed.)

I didn't ask for a get that would always return the same element.

You didn't ask for a get that *didn't* always return the same element. My deterministic version is totally compatible with your ask here and

...

Would you also complain that having int accept a string value in lieu of using eval on untrusted input is a case of "it's useful anytime you need it."?

...

...
I don't think your test is very good if it uses the get I wrote above. Your test will only operate on one element of the set and it's easy to write functions which succeed for some elements of the set and fail for others. I'd like to see an actual test that you think needs this that would not be improved by iterating over the list.

Talk about tautologies! Of course you can write tests that will fail in some cases. You can also write tests that won't fail for your cases. Especially if you know something about the set beforehand.

...

For instance, I happen to know I have a set of ElementTree elements that all have the same tag. I want to check the tag.

Then maybe you should be using a different data structure than a set. Maybe set_with_same_tag that declares that constraint and can enforce the constraint if you want.

...

One of the test cases starts by checking to see if the set is a singleton. Do you really propose something like:

if len(s) == 1: for i in s: res = process(i)

This is a legitimate use case. I don't think it's a big deal to have to add a one line function to your code. I might even use EAFP:

res = process(set_singleton(s)) where def set_singleton(s): [result] = s return result --- Bruce

Greg Ewing

11:51 p.m.

Bruce Leban wrote:

...

Steven D'Aprano

May 2012

4:28 p.m.

Mike Meyer wrote:

...

I didn't ask for a get that would always return the same element.

...

For instance, I happen to know I have a set of ElementTree elements that all have the same tag. I want to check the tag.

One of the test cases starts by checking to see if the set is a singleton. Do you really propose something like:

...

All of these require creating an intermediate object for the sole purpose of getting an item out of the container without destroying the container. This leads the reader to wonder why it was created,

Greg Ewing

1:51 a.m.

On 17/05/12 04:28, Steven D'Aprano wrote:

...

Nick Coghlan

8:08 a.m.

On Wed, May 16, 2012 at 5:40 PM, Mike Meyer <mwm@mired.org> wrote:

...

Mike Graham

2:44 p.m.

On Wed, May 16, 2012 at 4:08 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:

...

It sounds like you're re-implementing the venerable ,= operator. ,= is one of my favorite operators in Python. You know,

...

;^), Mike

Masklinn

2:51 p.m.

On 2012-05-16, at 16:44 , Mike Graham wrote:

...

Mike Graham

2:57 p.m.

On Wed, May 16, 2012 at 10:51 AM, Masklinn <masklinn@masklinn.net> wrote:

...

Mike

Masklinn

May 2012

4:43 p.m.

On 2012-05-16, at 16:57 , Mike Graham wrote:

...

2. I grouped it this way as a joke. If you do this, everyone will think you're crazy.

I don't mind, it expresses the intent clearly and looks *weird* at first glance which is fine by me: colleagues & readers are unlikely to miss it if they don't know the idiom. I genuinely like it.

Yury Selivanov

8:10 p.m.

On 2012-05-16, at 10:51 AM, Masklinn wrote:

...

With the difference that ,= also asserts there is only one item in the iterable, where `next . iter` only does `head`.

For that, the ,*_= operator exists ;)

...

I hope I'll never encounter this, though. - Yury

Nick Coghlan

7:11 a.m.

On Wed, May 16, 2012 at 4:32 PM, Mike Meyer <mwm@mired.org> wrote:

...

Mike Meyer

8:09 a.m.

On Wed, 16 May 2012 17:11:28 +1000
Nick Coghlan <ncoghlan@gmail.com> wrote:

> On Wed, May 16, 2012 at 4:32 PM, Mike Meyer <mwm@mired.org> wrote:
> > Is there some reason that there isn't a straightforward way to get an
> > element from a set without removing it? Everything I find either
> > requires multiple statements or converting the set to another data
> > type.
> > It seems that some kind of get method would be useful. The argument
> > that "getting an arbitrary element from a set isn't useful" is refuted
> > by 1) the existence of the pop method, which does just that, and 2)
> > the fact that I (and a number of other people) have run into such a
> > need.
> > My search for such a reason kept finding people asking how
> > to get an element instead. Of course, my key words (set and get) are
> > heavily overloaded.
> The two primary use cases handled by the current interface are:
> 1. Do something for all items in the set (iteration)
> 2. Do something for an arbitrary item in the set, and keep track of
> which items remain (set.pop)

Neither of which fits my use case.

> Since this use case is already covered by the iterator protocol, the
> question then becomes: Is there a specific reason a dedicated
> set-specific solution is needed rather than better educating people
> that "the first item" is an acceptable answer when the request is for
> "an arbitrary item" (this is particularly true in a world where set
> ordering is randomised by default)?

Because next(iter(s)) makes the reader wonder "Why is this iterator
being created?" It's a less expensive form of writing list(s)[0]. It's
also sufficiently non-obvious that the closest I found on google for a
discussion of the issue was the "for x in s: break" variant. Which
makes me think that at the very least, this idiom ought to be
mentioned in the documentation. Or if it's already there, then a
pointer added to the set documentation.

But my question was actually whether or not there was a reason for it
not existing. Has there been a previous discussion of this?

   <mike
-- 
Mike Meyer <mwm@mired.org>		http://www.mired.org/
Independent Software developer/SCM consultant, email for more information.

O< ascii ribbon campaign - stop html mail - www.asciiribbon.org

Paul Moore

8:34 a.m.

On 16 May 2012 09:09, Mike Meyer <mwm@mired.org> wrote:

...

Steven D'Aprano

4:20 p.m.

Mike Meyer wrote:

...

But my question was actually whether or not there was a reason for it not existing. Has there been a previous discussion of this?

Chris Rebert

May 2012

6:19 p.m.

On Wed, May 16, 2012 at 9:20 AM, Steven D'Aprano <steve@pearwood.info> wrote:

...

Masklinn

7:21 p.m.

On 2012-05-16, at 20:19 , Chris Rebert wrote:

...

Mathias Panzenböck

8:16 p.m.

On 05/16/2012 08:19 PM, Chris Rebert wrote:

...

Ruby's Set has first() but not last(). That saied I'm -1 on a get/pick method for a set.

...

Ethan Furman

11:08 p.m.

Steven D'Aprano wrote:

...

Don't forget the doc string! "returns an arbitrary element from set s" ~Ethan~

Greg Ewing

11:56 p.m.

Steven D'Aprano wrote:

...

By the way, I suggest that a better name than "get" is pick()

I was going to suggest peek(), which is more suggestive of a non-modifying function. -- Greg

Terry Reedy

10:41 p.m.

On 5/16/2012 4:09 AM, Mike Meyer wrote:

...

Because next(iter(s)) makes the reader wonder "Why is this iterator being created?"

If s is a non-iterator iterable, to get at the contents of s non-destructively.

...

makes me think that at the very least, this idiom ought to be mentioned in the documentation.

http://bugs.python.org/issue14836

...

But my question was actually whether or not there was a reason for it not existing.

Because not all simple compositions need to be added to the sdtlib and builtins. -- Terry Jan Reedy

Ben Finney

May 2012

7:39 a.m.

Mike Meyer <mwm@mired.org> writes:

...

Mike Meyer

8:12 a.m.

On Wed, 16 May 2012 17:39:10 +1000 Ben Finney <ben+python@benfinney.id.au> wrote:

...

If by “get” you mean to get an *arbitrary* item, not a specific item, then what's the problem? You already have ‘set.pop’, as you point out.

Ben Finney

3:43 p.m.

Mike Meyer <mwm@mired.org> writes:

...

Mike Meyer

8 p.m.

On Thu, 17 May 2012 01:43:06 +1000 Ben Finney <ben+python@benfinney.id.au> wrote:

...

...
On Wed, 16 May 2012 17:39:10 +1000 Ben Finney <ben+python@benfinney.id.au> wrote:

...
If by “get” you mean to get an *arbitrary* item, not a specific item, then what's the problem? You already have ‘set.pop’, as you point out. And, as I also pointed out, it's not useful in the case where you need to preserve the set for future use. Then, if ‘item = next(iter(foo_set))’ doesn't suit you, perhaps you'd

Mike Meyer <mwm@mired.org> writes: like ‘item = set(foo_set).pop()’.

...

Regardless, I think you have your answer: Like most things that can already be done by composing the existing pieces, this corner case hasn't met the deliberately-high bar for making a special method just to do it.

And it's already been discussed to death. *That's* what I was trying to find out. If it hadn't been, I'd have put together a serious proposal.

...

I still haven't seen you describe the use case where the existing ways of doing this aren't good enough.

4689

Age (days ago)

4690

Last active (days ago)

List overview

Download

40 comments

19 participants

participants (19)

Ben Finney
Bruce Leban
Cameron Simpson
Chris Rebert
Devin Jeanpierre
Dirkjan Ochtman
Ethan Furman
Greg Ewing
Masklinn
Mathias Panzenböck
Mike Graham
Mike Meyer
Nick Coghlan
Paul Du Bois
Paul Moore
Stephen J. Turnbull
Steven D'Aprano
Terry Reedy
Yury Selivanov

get method for sets?

Cameron Simpson

Mathias Panzenböck

Cameron Simpson

Mathias Panzenböck

tags

participants (19)