data structures should have an .any() method

Hi, I just had a discussion with a co-worker, and we noticed that there are use cases where you just want the only element in a data structure, or just any of the elements in a data structure because you know that they all contain the same information (with respect to what you are looking for, at least). If you want all items, you can iterate, but if you just want any item or the only item, it's inefficient (and not very explicit code) to create an iterator and take the element out. It's easy to do with ordered data structures such as lists or tuples ("container[0]"), but it's not so obvious for sets (or dicts), which means that you have to know what kind of container you receive to handle it correctly. I know there's .pop() on sets, but that modifies the data structure. It would therefore be nice to have a common ".any()" method on data structures that would just read an arbitrary item from a container. Regarding the special (and probably minor use) case of dicts, I assume it would return any key, so that you could get the value from the dict in a second step if you want. Only returning the value would not easily get you the key itself. Stefan

Matteo Dell'Amico wrote:
You can do next(iter(container)) (or iter(container).next() with python <= 2.5). This works fine with any iterable.
This and other responses miss part of Stefan's complaint: that creating an iterator (which isn't always cheap) only to throw it away almost immediately may be a somewhat wasteful operation. The shorthand expression above also suffers from the obscurity that Stefan was complaining about - there is very little to hint that "next(iter(obj))" means "get an arbitrary object out of a container". The StopIteration exception this approach will throw for an empty container is also rather unhelpful. That said, I'm -0 on the idea overall. If someone actually needs it, it isn't particularly hard for them to write their own getany() function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Nick Coghlan ha scritto:
It's not particularly wasteful for the built-in data structures, though. It seems to me that the cases where the performance of next(iter(obj)) would be an actual issue are quite rare.
Why? next(iter(obj)) means, pretty explicitly to me, "iterate on obj and give me one element". matteo

Matteo Dell'Amico wrote:
Because it overspecifies the semantics of what you're trying to do. It just happens that when the requirement is "get me any object in this container" the design of Python means that the easiest implementation is "get me the first object in this container". The expression form then reflects the implementation rather than the algorithmic intent. That said, this concise way of implementing the desired feature is certainly one of the reasons I am -0 on the idea of adding it to the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

I think dict.popitem() does something close to what the original post wanted. http://docs.python.org/3.1/library/stdtypes.html#dict.popitem On Fri, Sep 4, 2009 at 4:17 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:

Nick Coghlan <ncoghlan@...> writes:
I don't agree. Since iteration is such a frequent operation, any container which doesn't provide cheap iteration could be considered badly designed and/or badly implemented. Therefore it makes sense to rely on iteration when implementing other primitives. People worrying that it expresses implementation rather than intent can write the trivial abstraction by themselves: def any_item(x): return next(iter(x)) Regards Antoine.

Antoine Pitrou schrieb:
or any_item = compose(next, iter) ;) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Gerald Britton schrieb:
compose? Where'd you find that?
That was another recent discussion here. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On 5 Sep 2009, at 15:47 , Gerald Britton wrote: Ah -- so not a real function then (yet)? Though something we could
Yeah but you could leverage Python's *args to get a compositor of more than two functions e.g. def compose(*funcs): return reduce(lambda f1, f2: lambda *args, **kwargs: f1(f2(*args, **kwargs)), funcs)

I can't stand map/reduce/lambda... *instant readable makeover*def minicomp(f1, f2): def comped(*args, **kwargs): return f1(f2(*args, **kwargs)) return comped def compose(*funcs): total = funcs[0] for f in funcs[1:]: total = minicomp(total, f) return total On Sat, Sep 5, 2009 at 5:15 PM, Masklinn <masklinn@masklinn.net> wrote:
-- Yuv hzk.co.il

Yuvgoog Greenle schrieb:
Please, if this must be discussed again, do it in a new thread. cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On Fri, 4 Sep 2009 10:01:28 pm Matteo Dell'Amico wrote:
Why? next(iter(obj)) means, pretty explicitly to me, "iterate on obj and give me one element".
To me, it says "give me the first element", not "give me any (an arbitrary) element" or "give me a random element". Does anyone have a use-case for retrieving a single arbitrary element of an arbitrary sequence, without caring about any other elements? Is this really such a common operation that we need to consider it part of the interface for all collections? I doubt it. -- Steven D'Aprano

2009/9/5 Steven D'Aprano <steve@pearwood.info>:
The original use as described was for picking an "arbitrary" element, because all of the elements were effectively the same - "any of the elements in a data structure because you know that they all contain the same information". For a random element, use random.choice, for the first, use next(iter()). Either option satisfies the original requirement - but the first is bound to be faster, so it seems appropriate.
Agreed. Paul.

Nick Coghlan wrote:
That said, I'm -0 on the idea overall. If someone actually needs it, it isn't particularly hard for them to write their own getany() function.
There's a situation where the need to do this kind of thing actually arises fairly frequently -- retrieving things from a relational database. Often you're expecting exactly one result from a query, but the API always gives you a sequence, which you then have to get the first item from. Doing that over and over again gets rather tedious. -- Greg

Greg Ewing schrieb:
But if it's a sequence, you can simply do s[0], can't you? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On Sun, 6 Sep 2009 03:51:43 pm Greg Ewing wrote:
If you're expecting "exactly one result", then surely it should be an error to receive more than one result? Rather than ask for "any" result and ignoring any unexpected extra items, I think it would be better to have a helper function that verifies you have got exactly one result. -- Steven D'Aprano

Antoine Pitrou schrieb:
Amen! Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Greg Ewing wrote:
I think "one" fits two of the proposed three use cases pretty nicely. The only remaining use case is where you actually have more than one item but only want any one out of them. But I think in that case you can actually roll your own anyway, as there may be other constrains on exactly how 'equal' all the items are. Stefan

Steven D'Aprano wrote:
Well, some queries return results without duplicate elimination, even though they are defined to return sets. If you really want to limit things in databases queries, the "LIMIT 1" clause is your friend, as the query optimizer knows it can stop as soon as its found something. Of course I don't know which query optimizers around now _use_ that knowledge to pick a query plan, but that leaves the info there if the next rev becomes limit-capable. Often when I just want to pick a single value from a column I use MIN or MAX (and fairly often when I need two distinct values I use both MIN and MAX). One trick to seeing a column is exactly a singleton is: SELECT MIN(something) FROM ... WHERE MIN(something) = MAX(something) --Scott David Daniels Scott.Daniels@Acm.Org

Stefan Behnel wrote:
It would therefore be nice to have a common ".any()" method on data structures that would just read an arbitrary item from a container.
I'd advise against bare name "any" for this, since we already have the any() builtin with a completely different meaning. "getany" would probably be OK though. I'd also advise against using a method for this, since there is a reasonable default implementation that can be employed: def getany(container) if container: if isinstance(container, collections.Sequence): return container[0] else: for x in container: return x raise ValueError("No items in container") Finally, I'd suggest that any such function would belong in the collections module rather than being made a builtin. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Jan Kaliszewski wrote:
or: def getany(container): for x in container: return x raise ValueError("No items in container") I actually like that, although I find this more readable: def getany(container): for x in container: return x else: raise ValueError("No items in container") Stefan

On Fri, 4 Sep 2009 08:03:21 pm Nick Coghlan wrote:
Given the above implementation, repeated calls to getany(sequence) will return the same item each time. Is that what people will expect by a function that claims to return "any" element of a collection? I suspect that users will be evenly divided into those who say Yes, those who say No, and those who say "It depends on what I'm trying to do". Should it return an arbitrary item, or a random item? Is "the first item" arbitrary enough? It should be for dicts, which are unordered, but may not be for lists. I think the answers to these questions are too application-specific for any solution or solutions to go into the standard library. It probably belongs in the cookbook as a handful of related recipes. -- Steven D'Aprano

On Fri, Sep 4, 2009 at 2:35 AM, Stefan Behnel<stefan_ml@behnel.de> wrote:
I assure you it's not slow. next(iter(x)) is probably as good as it gets -- I don't think we need another way to say that in fewer words. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Not in absolute numbers, but certainly slower than necessary: $ python2.6 -m timeit -s 'l=[1]' 'l[0]' 10000000 loops, best of 3: 0.0977 usec per loop $ python2.6 -m timeit -s 'l=[1]' 'next(iter(l))' 1000000 loops, best of 3: 0.523 usec per loop
next(iter(x)) is probably as good as it gets -- I don't think we need another way to say that in fewer words.
I'm fine with such a decision, given that it's trivial to wrap this into your own function. That doesn't make it much faster: $ python2.6 -m timeit -s 'l=[1]' -s 'def getany(l):' -s ' return l[0]' 'getany(l)' 1000000 loops, best of 3: 0.34 usec per loop $ python2.6 -m timeit -s 'l=[1]' -s 'def getany(l):' \ -s ' for x in l:' \ -s ' return x' \ 'getany(l)' 1000000 loops, best of 3: 0.454 usec per loop $ python2.6 -m timeit -s 'l=[1]' -s 'def getany(l):' \ -s ' return next(iter(l))' \ 'getany(l)' 1000000 loops, best of 3: 0.743 usec per loop but, admittedly, that's still not slow in absolute numbers. Stefan

More likely the cost of the two dynamic lookups of builtin functions is your cost. On Fri, Sep 4, 2009 at 12:56 PM, Gerald Britton<gerald.britton@gmail.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Stefan Behnel wrote:
It would therefore be nice to have a common ".any()" method on data structures that would just read an arbitrary item from a container.
Rather than add a method to every container implementation, it would be easier to provide a function: def first(obj): return iter(ob).next() possibly with some embellishments to handle StopIteration, allow for a default value, etc. -- Greg

Lie Ryan wrote:
Not really - every iterator in Python is guaranteed to either have a first value or throw an exception when you try to retrieve it via next(). There's no such guarantee that every iterator will terminate and hence have a "last" value (cf. itertools.count). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Matteo Dell'Amico wrote:
You can do next(iter(container)) (or iter(container).next() with python <= 2.5). This works fine with any iterable.
This and other responses miss part of Stefan's complaint: that creating an iterator (which isn't always cheap) only to throw it away almost immediately may be a somewhat wasteful operation. The shorthand expression above also suffers from the obscurity that Stefan was complaining about - there is very little to hint that "next(iter(obj))" means "get an arbitrary object out of a container". The StopIteration exception this approach will throw for an empty container is also rather unhelpful. That said, I'm -0 on the idea overall. If someone actually needs it, it isn't particularly hard for them to write their own getany() function. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Nick Coghlan ha scritto:
It's not particularly wasteful for the built-in data structures, though. It seems to me that the cases where the performance of next(iter(obj)) would be an actual issue are quite rare.
Why? next(iter(obj)) means, pretty explicitly to me, "iterate on obj and give me one element". matteo

Matteo Dell'Amico wrote:
Because it overspecifies the semantics of what you're trying to do. It just happens that when the requirement is "get me any object in this container" the design of Python means that the easiest implementation is "get me the first object in this container". The expression form then reflects the implementation rather than the algorithmic intent. That said, this concise way of implementing the desired feature is certainly one of the reasons I am -0 on the idea of adding it to the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

I think dict.popitem() does something close to what the original post wanted. http://docs.python.org/3.1/library/stdtypes.html#dict.popitem On Fri, Sep 4, 2009 at 4:17 PM, Nick Coghlan<ncoghlan@gmail.com> wrote:

Nick Coghlan <ncoghlan@...> writes:
I don't agree. Since iteration is such a frequent operation, any container which doesn't provide cheap iteration could be considered badly designed and/or badly implemented. Therefore it makes sense to rely on iteration when implementing other primitives. People worrying that it expresses implementation rather than intent can write the trivial abstraction by themselves: def any_item(x): return next(iter(x)) Regards Antoine.

Antoine Pitrou schrieb:
or any_item = compose(next, iter) ;) Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Gerald Britton schrieb:
compose? Where'd you find that?
That was another recent discussion here. Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On 5 Sep 2009, at 15:47 , Gerald Britton wrote: Ah -- so not a real function then (yet)? Though something we could
Yeah but you could leverage Python's *args to get a compositor of more than two functions e.g. def compose(*funcs): return reduce(lambda f1, f2: lambda *args, **kwargs: f1(f2(*args, **kwargs)), funcs)

I can't stand map/reduce/lambda... *instant readable makeover*def minicomp(f1, f2): def comped(*args, **kwargs): return f1(f2(*args, **kwargs)) return comped def compose(*funcs): total = funcs[0] for f in funcs[1:]: total = minicomp(total, f) return total On Sat, Sep 5, 2009 at 5:15 PM, Masklinn <masklinn@masklinn.net> wrote:
-- Yuv hzk.co.il

Yuvgoog Greenle schrieb:
Please, if this must be discussed again, do it in a new thread. cheers, Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On Fri, 4 Sep 2009 10:01:28 pm Matteo Dell'Amico wrote:
Why? next(iter(obj)) means, pretty explicitly to me, "iterate on obj and give me one element".
To me, it says "give me the first element", not "give me any (an arbitrary) element" or "give me a random element". Does anyone have a use-case for retrieving a single arbitrary element of an arbitrary sequence, without caring about any other elements? Is this really such a common operation that we need to consider it part of the interface for all collections? I doubt it. -- Steven D'Aprano

2009/9/5 Steven D'Aprano <steve@pearwood.info>:
The original use as described was for picking an "arbitrary" element, because all of the elements were effectively the same - "any of the elements in a data structure because you know that they all contain the same information". For a random element, use random.choice, for the first, use next(iter()). Either option satisfies the original requirement - but the first is bound to be faster, so it seems appropriate.
Agreed. Paul.

Nick Coghlan wrote:
That said, I'm -0 on the idea overall. If someone actually needs it, it isn't particularly hard for them to write their own getany() function.
There's a situation where the need to do this kind of thing actually arises fairly frequently -- retrieving things from a relational database. Often you're expecting exactly one result from a query, but the API always gives you a sequence, which you then have to get the first item from. Doing that over and over again gets rather tedious. -- Greg

Greg Ewing schrieb:
But if it's a sequence, you can simply do s[0], can't you? Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

On Sun, 6 Sep 2009 03:51:43 pm Greg Ewing wrote:
If you're expecting "exactly one result", then surely it should be an error to receive more than one result? Rather than ask for "any" result and ignoring any unexpected extra items, I think it would be better to have a helper function that verifies you have got exactly one result. -- Steven D'Aprano

Antoine Pitrou schrieb:
Amen! Georg -- Thus spake the Lord: Thou shalt indent with four spaces. No more, no less. Four shall be the number of spaces thou shalt indent, and the number of thy indenting shall be four. Eight shalt thou not indent, nor either indent thou two, excepting that thou then proceed to four. Tabs are right out.

Greg Ewing wrote:
I think "one" fits two of the proposed three use cases pretty nicely. The only remaining use case is where you actually have more than one item but only want any one out of them. But I think in that case you can actually roll your own anyway, as there may be other constrains on exactly how 'equal' all the items are. Stefan

Steven D'Aprano wrote:
Well, some queries return results without duplicate elimination, even though they are defined to return sets. If you really want to limit things in databases queries, the "LIMIT 1" clause is your friend, as the query optimizer knows it can stop as soon as its found something. Of course I don't know which query optimizers around now _use_ that knowledge to pick a query plan, but that leaves the info there if the next rev becomes limit-capable. Often when I just want to pick a single value from a column I use MIN or MAX (and fairly often when I need two distinct values I use both MIN and MAX). One trick to seeing a column is exactly a singleton is: SELECT MIN(something) FROM ... WHERE MIN(something) = MAX(something) --Scott David Daniels Scott.Daniels@Acm.Org

Stefan Behnel wrote:
It would therefore be nice to have a common ".any()" method on data structures that would just read an arbitrary item from a container.
I'd advise against bare name "any" for this, since we already have the any() builtin with a completely different meaning. "getany" would probably be OK though. I'd also advise against using a method for this, since there is a reasonable default implementation that can be employed: def getany(container) if container: if isinstance(container, collections.Sequence): return container[0] else: for x in container: return x raise ValueError("No items in container") Finally, I'd suggest that any such function would belong in the collections module rather than being made a builtin. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Jan Kaliszewski wrote:
or: def getany(container): for x in container: return x raise ValueError("No items in container") I actually like that, although I find this more readable: def getany(container): for x in container: return x else: raise ValueError("No items in container") Stefan

On Fri, 4 Sep 2009 08:03:21 pm Nick Coghlan wrote:
Given the above implementation, repeated calls to getany(sequence) will return the same item each time. Is that what people will expect by a function that claims to return "any" element of a collection? I suspect that users will be evenly divided into those who say Yes, those who say No, and those who say "It depends on what I'm trying to do". Should it return an arbitrary item, or a random item? Is "the first item" arbitrary enough? It should be for dicts, which are unordered, but may not be for lists. I think the answers to these questions are too application-specific for any solution or solutions to go into the standard library. It probably belongs in the cookbook as a handful of related recipes. -- Steven D'Aprano

On Fri, Sep 4, 2009 at 2:35 AM, Stefan Behnel<stefan_ml@behnel.de> wrote:
I assure you it's not slow. next(iter(x)) is probably as good as it gets -- I don't think we need another way to say that in fewer words. -- --Guido van Rossum (home page: http://www.python.org/~guido/)

Guido van Rossum wrote:
Not in absolute numbers, but certainly slower than necessary: $ python2.6 -m timeit -s 'l=[1]' 'l[0]' 10000000 loops, best of 3: 0.0977 usec per loop $ python2.6 -m timeit -s 'l=[1]' 'next(iter(l))' 1000000 loops, best of 3: 0.523 usec per loop
next(iter(x)) is probably as good as it gets -- I don't think we need another way to say that in fewer words.
I'm fine with such a decision, given that it's trivial to wrap this into your own function. That doesn't make it much faster: $ python2.6 -m timeit -s 'l=[1]' -s 'def getany(l):' -s ' return l[0]' 'getany(l)' 1000000 loops, best of 3: 0.34 usec per loop $ python2.6 -m timeit -s 'l=[1]' -s 'def getany(l):' \ -s ' for x in l:' \ -s ' return x' \ 'getany(l)' 1000000 loops, best of 3: 0.454 usec per loop $ python2.6 -m timeit -s 'l=[1]' -s 'def getany(l):' \ -s ' return next(iter(l))' \ 'getany(l)' 1000000 loops, best of 3: 0.743 usec per loop but, admittedly, that's still not slow in absolute numbers. Stefan

More likely the cost of the two dynamic lookups of builtin functions is your cost. On Fri, Sep 4, 2009 at 12:56 PM, Gerald Britton<gerald.britton@gmail.com> wrote:
-- --Guido van Rossum (home page: http://www.python.org/~guido/)

Stefan Behnel wrote:
It would therefore be nice to have a common ".any()" method on data structures that would just read an arbitrary item from a container.
Rather than add a method to every container implementation, it would be easier to provide a function: def first(obj): return iter(ob).next() possibly with some embellishments to handle StopIteration, allow for a default value, etc. -- Greg

Lie Ryan wrote:
Not really - every iterator in Python is guaranteed to either have a first value or throw an exception when you try to retrieve it via next(). There's no such guarantee that every iterator will terminate and hence have a "last" value (cf. itertools.count). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------
participants (16)
-
Antoine Pitrou
-
Georg Brandl
-
Gerald Britton
-
Greg Ewing
-
Guido van Rossum
-
ilya
-
Jan Kaliszewski
-
Lie Ryan
-
Masklinn
-
Matteo Dell'Amico
-
Nick Coghlan
-
Paul Moore
-
Scott David Daniels
-
Stefan Behnel
-
Steven D'Aprano
-
Yuvgoog Greenle