"Exposing" `__min__` and `__max__`
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency. For example: # A "self-sorted" list object class AlwaysSortedListObejct: def __min__(self): return self.lst[0] def __max__(self): return self.lst[-1] # An object that maintains indices of extrema (e.g. for complex comparisons) class KeepsTrackOfExtrema: def __init__(self): self.min_index = None self.max_index = None def append(self, obj): new_index = len(obj) self.backer.append(obj) if (self.max_index is None) or (obj > self.backer[self.max_index]): self.max_index = new_index if (self.min_index is None) or (obj < self.backer[self.min_index]): self.min_index = new_index def __min__(self): return self.backer[self.min_index] def __max__(self): return self.backer[self.max_index] Where these methods be called via the single-argument calls to `max(obj)` and `min(obj)`. If it's not clear, it'd be similar to the way __len__ is called (when defined) via len(obj). My solution was to implement a .min() method, but that caused some ugly special casing when the object could also be a regular list (where I'd want to iterate over all of the items). I searched the list, but has this been discussed before? Is there any merit in it?
Do you mind sharing an example usage in a realistic context? There might be
a good solution that doesn't require adding magic methods.
On Tue, Jun 19, 2018, 12:24 PM James Edwards
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency. For example:
# A "self-sorted" list object class AlwaysSortedListObejct: def __min__(self): return self.lst[0] def __max__(self): return self.lst[-1]
# An object that maintains indices of extrema (e.g. for complex comparisons) class KeepsTrackOfExtrema: def __init__(self): self.min_index = None self.max_index = None
def append(self, obj): new_index = len(obj) self.backer.append(obj)
if (self.max_index is None) or (obj > self.backer[self.max_index]): self.max_index = new_index
if (self.min_index is None) or (obj < self.backer[self.min_index]): self.min_index = new_index
def __min__(self): return self.backer[self.min_index] def __max__(self): return self.backer[self.max_index]
Where these methods be called via the single-argument calls to `max(obj)` and `min(obj)`.
If it's not clear, it'd be similar to the way __len__ is called (when defined) via len(obj).
My solution was to implement a .min() method, but that caused some ugly special casing when the object could also be a regular list (where I'd want to iterate over all of the items).
I searched the list, but has this been discussed before? Is there any merit in it?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Well, numpy implements ndarray.min(). It would be very nice if
min(np.array) worked as expected.
Pål
On 19 Jun 2018 21:33, "Michael Selik"
Do you mind sharing an example usage in a realistic context? There might be a good solution that doesn't require adding magic methods.
On Tue, Jun 19, 2018, 12:24 PM James Edwards
wrote: I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency. For example:
# A "self-sorted" list object class AlwaysSortedListObejct: def __min__(self): return self.lst[0] def __max__(self): return self.lst[-1]
# An object that maintains indices of extrema (e.g. for complex comparisons) class KeepsTrackOfExtrema: def __init__(self): self.min_index = None self.max_index = None
def append(self, obj): new_index = len(obj) self.backer.append(obj)
if (self.max_index is None) or (obj > self.backer[self.max_index]): self.max_index = new_index
if (self.min_index is None) or (obj < self.backer[self.min_index]): self.min_index = new_index
def __min__(self): return self.backer[self.min_index] def __max__(self): return self.backer[self.max_index]
Where these methods be called via the single-argument calls to `max(obj)` and `min(obj)`.
If it's not clear, it'd be similar to the way __len__ is called (when defined) via len(obj).
My solution was to implement a .min() method, but that caused some ugly special casing when the object could also be a regular list (where I'd want to iterate over all of the items).
I searched the list, but has this been discussed before? Is there any merit in it?
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Jun 19, 2018 at 12:33:15PM -0700, Michael Selik wrote:
Do you mind sharing an example usage in a realistic context? There might be a good solution that doesn't require adding magic methods.
You have some sort of binary search tree that is iterated over in some arbitrary order. Calling min(tree) iterates over the entire tree, even if the tree knows how to find the minimum much more efficiently. Iterating over the entire tree is O(N), where N = number of nodes. More efficent min is typically O(D), where D = depth, which is typically about log_2 N if the tree is balanced. I know that for many purposes, we use dicts as they are more convenient and easier to use than trees, but there are still plenty of realistic use-cases for trees. -- Steve
19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions. 1. What to do with additional min() and max() arguments: key and default. 2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases? If supporting of arguments key and default are not needed for you, you can implement your own functions and use them instead of min() and max().
On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote:
19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions.
1. What to do with additional min() and max() arguments: key and default.
Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder: min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?
Reasonable questions, but I don't think that the cost of testing: if hasattr(type(obj), '__min__') # or equivalent is going to be very large. Amortized over O(N) comparisons, that's practically free :-) More important, I think, is the increase in API complexity. That's two more dunders to learn about. The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not. I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own? Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases. -- Steve
Are there any builtins or std library classes that offer their own min()/max() methods?
My first instinct was heapq[1], since the way to access the min value is
simply heap[0] (and I thought it could benefit from __min__) -- it's almost
the perfect motivating example. But as it stands, the module uses
functions to operate directly on a standard list, so even if __min__ were
exposed, min(heap) would still iterate over the entire list.
That being said, a heap *class* could take advantage of this, and provide a
semantically consistent optimization.
I'm not sure how many examples will be found in stdlib, as I expect this
optimization to be restricted to specialized container types like heaps,
but I'll keep searching.
[1] https://docs.python.org/3.6/library/heapq.html
On Wed, Jun 20, 2018 at 3:00 AM, Steven D'Aprano
19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could
On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote: think
of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions.
1. What to do with additional min() and max() arguments: key and default.
Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:
min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?
Reasonable questions, but I don't think that the cost of testing:
if hasattr(type(obj), '__min__') # or equivalent
is going to be very large. Amortized over O(N) comparisons, that's practically free :-)
More important, I think, is the increase in API complexity. That's two more dunders to learn about.
The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not.
I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own?
Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases.
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Finding more realistic use cases is key -- the actual spec is pretty
obvious and doesn't worry me in terms of added language or implementation
complexity.
I think just finding a data structure that should implement its own min/max
funtionality (or maybe one of these, like heapq) is not enough motivation.
You have to find code where such a data structure (let's say a Tree) is
passed to some function that also accepts, say, a list. Then that function
would benefit from being able to call just `min(x)` rather than `x.min() if
isinstance(x, Tree) else min(x)`. If whenever you have a Tree you know that
you have a Tree (because it has other unique methods) then there's no
burden for the user to call x.min().
--Guido
On Wed, Jun 20, 2018 at 5:24 AM James Edwards
Are there any builtins or std library classes that offer their own min()/max() methods?
My first instinct was heapq[1], since the way to access the min value is simply heap[0] (and I thought it could benefit from __min__) -- it's almost the perfect motivating example. But as it stands, the module uses functions to operate directly on a standard list, so even if __min__ were exposed, min(heap) would still iterate over the entire list.
That being said, a heap *class* could take advantage of this, and provide a semantically consistent optimization.
I'm not sure how many examples will be found in stdlib, as I expect this optimization to be restricted to specialized container types like heaps, but I'll keep searching.
[1] https://docs.python.org/3.6/library/heapq.html
On Wed, Jun 20, 2018 at 3:00 AM, Steven D'Aprano
wrote: 19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could
On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote: think
of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions.
1. What to do with additional min() and max() arguments: key and default.
Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:
min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?
Reasonable questions, but I don't think that the cost of testing:
if hasattr(type(obj), '__min__') # or equivalent
is going to be very large. Amortized over O(N) comparisons, that's practically free :-)
More important, I think, is the increase in API complexity. That's two more dunders to learn about.
The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not.
I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own?
Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases.
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
I do think there is some merit in being able to as easily as possible
transition from something being, say list-backed, to a domain specific data
structure that's more efficient. If one wanted to go from a list to a
minheap, it'd be nice just to change `x = []` to `x = Heap()` and have
everything else "just work", without having to change all `min(x)`s to
`x.min()`s, etc. But trying to find places in production code where there
is the special casing you described is (now) on the top of my priorities.
Outside of invariant-maintaining data structures though, I think the most
compelling case I've come up with so far is that it could be implemented on
certain types of generators, without having to generate the whole
sequence. (IMHO, they're neat side effects, but not very compelling by
themselves):
- __min__ and __max__ could be implemented on the range object, as
calculating these are straightforward, and you wouldn't need to generate
the entire sequence. Passing an "un-listified" range to a function that
accepts lists is somewhat common, I think. The inefficiency is compounded
if you generate the sequence multiple times, e.g.:
def get_err(iterable):
return max(iterable) - min(iterable)
x = range(10000)
get_err(x)
# vs
x = list(range(10000))
get_err(x)
Here, implementing __min__ and __max__ would make passing a range not
just as fast as passing a "listified" range, but significantly faster. But
I think this is just a nice coincidence of exposing the special methods and
by no means motivating by itself.
- There are also a few infinite generators in itertools where calling min()
or max() on the generator will run forever, despite at least one of these
being a clearly defined:
from itertools import count, cycle
gen = count(start=0, step=1) # min=0, no max
# or
gen = cycle([1,2,3]) # min=1, max=3
print(min(gen)) # Will never terminate
I'm even less sure about how often this would actually help than I am the
range example. I don't envision many places where people are passing
infinite generators to things expecting standard lists -- especially in
light of the fact that calling min() or max() on them will prevent further
execution.
Thanks for all the feedback, the search will continue :)
On Wed, Jun 20, 2018 at 11:24 AM, Guido van Rossum
Finding more realistic use cases is key -- the actual spec is pretty obvious and doesn't worry me in terms of added language or implementation complexity.
I think just finding a data structure that should implement its own min/max funtionality (or maybe one of these, like heapq) is not enough motivation. You have to find code where such a data structure (let's say a Tree) is passed to some function that also accepts, say, a list. Then that function would benefit from being able to call just `min(x)` rather than `x.min() if isinstance(x, Tree) else min(x)`. If whenever you have a Tree you know that you have a Tree (because it has other unique methods) then there's no burden for the user to call x.min().
--Guido
On Wed, Jun 20, 2018 at 5:24 AM James Edwards
wrote: Are there any builtins or std library classes that offer their own min()/max() methods?
My first instinct was heapq[1], since the way to access the min value is simply heap[0] (and I thought it could benefit from __min__) -- it's almost the perfect motivating example. But as it stands, the module uses functions to operate directly on a standard list, so even if __min__ were exposed, min(heap) would still iterate over the entire list.
That being said, a heap *class* could take advantage of this, and provide a semantically consistent optimization.
I'm not sure how many examples will be found in stdlib, as I expect this optimization to be restricted to specialized container types like heaps, but I'll keep searching.
[1] https://docs.python.org/3.6/library/heapq.html
On Wed, Jun 20, 2018 at 3:00 AM, Steven D'Aprano
wrote: 19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could
On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote: think
of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions.
1. What to do with additional min() and max() arguments: key and default.
Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:
min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?
Reasonable questions, but I don't think that the cost of testing:
if hasattr(type(obj), '__min__') # or equivalent
is going to be very large. Amortized over O(N) comparisons, that's practically free :-)
More important, I think, is the increase in API complexity. That's two more dunders to learn about.
The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not.
I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own?
Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases.
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)
20.06.18 10:00, Steven D'Aprano пише:
On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote:
1. What to do with additional min() and max() arguments: key and default.
Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:
min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)
The devil is in details. And you will see this when try to implement min() and __min__(). 1) There is no default value for default. This makes handling it in Python code hard. 2) Two original examples don't work with the key function. You will need to add complex caches for supporting different key functions, and this will add new problems. In future we may add new parameters for min() and max(). This is not closed protocol as for len() or `+`.
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?
Reasonable questions, but I don't think that the cost of testing:
if hasattr(type(obj), '__min__') # or equivalent
is going to be very large. Amortized over O(N) comparisons, that's practically free :-)
N may be small. And I suppose that for most calls it may be <10 or even <5. Note that the cost will be much larger than for __len__ or __add__, because new dunder methods will not have slots.
On Wed, Jun 20, 2018, 00:05 Serhiy Storchaka
19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions.
1. What to do with additional min() and max() arguments: key and default.
Neither should be passed to a dunder. It is not possible to handle `key` without figuring out if a function is monotonic (a Turing-complete problem in general) or anti-monotonic (if that is a real term), so you MUST fall back on full iteration if a key is provided. `default` is only used in case of an empty collection. The only question is, who has responsibility for detecting an empty collection, and how? Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection? Dunder detects: Right now, `max` detects empty by watching for StopIteration, which can no longer be a false positive. StopIterations from a deeper scope are wrapped. If the dunder throws an error to signal emptiness, it should not be thrown otherwise. I think that's impossible to guarantee.
On Tue, Jun 26, 2018 at 11:34 AM, Franklin? Lee
On Wed, Jun 20, 2018, 00:05 Serhiy Storchaka
wrote: 19.06.18 22:18, James Edwards пише:
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.
There are two questions.
1. What to do with additional min() and max() arguments: key and default.
Neither should be passed to a dunder.
It is not possible to handle `key` without figuring out if a function is monotonic (a Turing-complete problem in general) or anti-monotonic (if that is a real term), so you MUST fall back on full iteration if a key is provided.
`default` is only used in case of an empty collection. The only question is, who has responsibility for detecting an empty collection, and how?
Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?
Dunder detects: Right now, `max` detects empty by watching for StopIteration, which can no longer be a false positive. StopIterations from a deeper scope are wrapped. If the dunder throws an error to signal emptiness, it should not be thrown otherwise. I think that's impossible to guarantee.
There's an argument that you DO want to pass to the dunder: `last=True`. It's not currently part of `min` and `max`. Currently, if there are multiple items that are maximum, `max` will return the first one. In the future, a `last:bool` param could be added, and a dunder for `max` will want to handle it.
On 6/26/18 11:34 AM, Franklin? Lee wrote:
It is not possible to handle `key` without figuring out if a function is monotonic (a Turing-complete problem in general) or anti-monotonic (if that is a real term), so you MUST fall back on full iteration if a key is provided.
Monotonic (in this sense) just means never changing directions, it can be increasing and never decreasing or decreasing and never increasing so we don't need the 'anti-' version. -- Richard Damon
2018-06-26 17:34 GMT+02:00 Franklin? Lee
Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?
Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.). However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.
On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp
Caller detects: The caller checks length before calling the dunder. If
2018-06-26 17:34 GMT+02:00 Franklin? Lee
: there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?
Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).
However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.
Have you ever written ``max(range(x))`` in production code?
2018-06-27 9:36 GMT+02:00 Michael Selik
On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp
wrote: 2018-06-26 17:34 GMT+02:00 Franklin? Lee
: Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?
Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).
However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.
Have you ever written ``max(range(x))`` in production code?
Have you ever written len(range(x)) in production code ?
On Wed, Jun 27, 2018 at 12:36:12AM -0700, Michael Selik wrote:
On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp
wrote:
Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).
range is not a generator.
However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.
Have you ever written ``max(range(x))`` in production code?
I have never written that. But I have written ``max(iterable)`` dozens of times, where iterable could be a range object. -- Steve
On Wed, Jun 27, 2018, 3:06 AM Steven D'Aprano
On Wed, Jun 27, 2018 at 12:36:12AM -0700, Michael Selik wrote:
On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp
wrote: Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).
range is not a generator.
However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.
Have you ever written ``max(range(x))`` in production code?
I have never written that.
But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:
Have you ever written ``max(range(x))`` in production code?
I have never written that.
But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you. If you control your own input, then this doesn't sound too interesting. You know when you will get a list, and you can call max() on it, and you know when you are passing yourself a tree, and you can give your own tree a max() method and call that. But if you don't control your input, then this is a good way to delegate back to the unknown object of an unknown class which knows itself. Currently max() and min() have no choice but to walk the entire data structure, even if the object already knows its own maximum and minimum values. That's wasteful. An analogy: in Ruby, the equivalent of the len() built-in falls back on iteration as a last resort for any object which doesn't define a len method: size = 0 for x in obj: size += 1 return size By this analogy, max() and min() currently are like that last-resort version of len(), except they do it for *every* object that can be iterated over even when there's no need. Imagine that Python's len() always walked the entire iterable, from start to end, to count the length. Now suppose that you proposed adding a __len__ protocol so that objects that know their own length can report it quickly, and in response I argued that len(range(x)) was unrealistic and that there is no need for a __len__ method because we could just say range(x).stop instead. I don't think you would find that argument very persuasive, would you? -- Steve
On Wed, Jun 27, 2018 at 7:30 AM Steven D'Aprano
On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:
Have you ever written ``max(range(x))`` in production code? I have never written that. But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.
This is not a complete, realistic example. You're describing what an example might be, but not providing a concrete one with context. Quoting Guido from earlier in the thread: "I think just finding a data structure that should implement its own min/max funtionality (or maybe one of these, like heapq) is not enough motivation. You have to find code where such a data structure (let's say a Tree) is passed to some function that also accepts, say, a list." Imagine that Python's len() always walked the entire iterable, from
start to end, to count the length. Now suppose that you proposed
adding a __len__ protocol so that objects that know their own length can report it quickly, and in response I argued that len(range(x)) was unrealistic and that there is no need for a __len__ method because we could just say range(x).stop instead. I don't think you would find that argument very persuasive, would you?
I would, actually. The range object is particularly unpersuasive as a motivation for magic methods, because of its unusual usage. The hypothetical Tree object discussed earlier was much more interesting.
On Wed, Jun 27, 2018, 10:31 Steven D'Aprano
On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:
Have you ever written ``max(range(x))`` in production code?
I have never written that.
But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.
If you control your own input, then this doesn't sound too interesting. You know when you will get a list, and you can call max() on it, and you know when you are passing yourself a tree, and you can give your own tree a max() method and call that.
But if you don't control your input, then this is a good way to delegate back to the unknown object of an unknown class which knows itself.
Currently max() and min() have no choice but to walk the entire data structure, even if the object already knows its own maximum and minimum values. That's wasteful.
An analogy: in Ruby, the equivalent of the len() built-in falls back on iteration as a last resort for any object which doesn't define a len method:
size = 0 for x in obj: size += 1 return size
By this analogy, max() and min() currently are like that last-resort version of len(), except they do it for *every* object that can be iterated over even when there's no need.
Imagine that Python's len() always walked the entire iterable, from start to end, to count the length. Now suppose that you proposed adding a __len__ protocol so that objects that know their own length can report it quickly, and in response I argued that
len(range(x))
was unrealistic and that there is no need for a __len__ method because we could just say
range(x).stop
instead. I don't think you would find that argument very persuasive, would you?
Let's just assume Michael wants to know, and isn't making an argument against the proposal. `range` is more a sequence (a collection) than a generator. It's not Python 2's xrange. You can iterate through one multiple times. If `range` did not have a length, we would eventually propose that it should. I can't think of a real iterable which has no length but knows what its max is. An iterator or generator can't generally know its max without consuming everything. I have some vague thoughts about collections which have known bounds but not known sizes, but nothing concrete. It's bikeshedding, anyway. How isn't as important as whether it's useful. By the way, range(1e7) fails with a type error.
On Wed, Jun 27, 2018 at 8:16 AM Franklin? Lee
On Wed, Jun 27, 2018, 10:31 Steven D'Aprano
wrote: On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.
Let's just assume Michael wants to know, and isn't making an argument against the proposal.
I do want to know, but it's also an argument against the proposal -- that no one has contributed in-context usage to demonstrate the value. I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``. Chris Barker explained the issue well.
On Wed, Jun 27, 2018 at 11:30 AM, Michael Selik
On Wed, Jun 27, 2018 at 8:16 AM Franklin? Lee
wrote: On Wed, Jun 27, 2018, 10:31 Steven D'Aprano
wrote: On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.
Let's just assume Michael wants to know, and isn't making an argument against the proposal.
I do want to know, but it's also an argument against the proposal -- that no one has contributed in-context usage to demonstrate the value. I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``. Chris Barker explained the issue well.
Then maybe you shouldn't have picked on the verbatim case of `len(range(...))`. You won't find many examples deciding between `max(x)` and `x.max()`. `np.nanmax(x)` will optimize the case where you care about efficiency. (By the way, Numpy checks for `x.max` if `x` isn't an `ndarray`, so it already kind of implements the proposal.) I honestly can't imagine a case where you need to know the max of some possibly-unsorted collection, but don't eventually iterate through the whole thing anyway (no difference in asymptotic time). You might have some savings sometimes, but the overall gain is unpredictable, and shouldn't be depended on. A case more realistic to me is when you know it's definitely a collection that knows its max/min, but you don't know what kind of collection it is. But a sorted collection is pretty much a sequence, and you can usually get the first or last element using c[0] and c[-1], unless it's a SortedDict. A heap or a Young tableau knows its min, and only has to search its leaves/boundary for its max (or vice versa). For a sorted collection, you may want to check against min and max before inserting, deleting, or searching for an element. However, these checks can be implemented by the functions doing the insert/delete/search. I expect existing uses for dunder max will appear in generic algorithm libraries, rather than in concrete uses. If you're switching to a sorted collection, it suggests that you want efficiency, and I imagine you'll modify your code to fit the new data structure.
I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``.
I understand that, and I've spent a few hours searching github with
less-than-stellar results due to github's search syntax ignoring '.' and '('
https://help.github.com/articles/searching-code/#considerations-for-code-sea....
(That being said, there are a number of projects
https://github.com/search?l=Python&q=__min__&type=Code on github that
seem to expect __min__ to work (example
https://github.com/windowssocket/algorithm/blob/d731dca2917b6d6073a49d53cf43...),
but it's true that some are using the dunder for their own purposes.).
However, there are many modules that implement objects that could make good
use of exposing __min__ (e.g. bintrees https://pypi.org/project/bintrees/,
sortedcontainers http://www.grantjenks.com/docs/sortedcontainers/).
bintrees provides `min_item()` and `max_item()` accessors.
sortedcontainers doesn't seem to explicitly provide similar methods, but
maintain the sortedness of the objects so min and max can be accessed via
index.
And I can't stress enough the value in being able to switch to one of these
classes from a standard iterable and have the rest of your code (e.g.
`min()`s "just work").
Also, the desire for custom implementations is there
https://stackoverflow.com/questions/45308212/is-there-a-way-to-return-a-cust...,
but the proposed solutions rely on (IMO) ugly hacks that would break other
things.
The search for the perfect `x.min() if isinstance(...) else min(x)` still
continues, however.
On Wed, Jun 27, 2018 at 11:30 AM, Michael Selik
On Wed, Jun 27, 2018 at 8:16 AM Franklin? Lee < leewangzhong+python@gmail.com> wrote:
On Wed, Jun 27, 2018, 10:31 Steven D'Aprano
wrote: On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.
A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.
Let's just assume Michael wants to know, and isn't making an argument against the proposal.
I do want to know, but it's also an argument against the proposal -- that no one has contributed in-context usage to demonstrate the value. I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``. Chris Barker explained the issue well.
_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I don’t think anyone would argue that there would be use cases for __max__ and __min__ special methods. However, there is substantial overhead to adding new magic methods, so the question is not whether it would be useful in some special cases, but whether it would be useful enough in common enough cases to be worth the overhead. For example, we had a discussion on this list about adding a __sort_key__ magic method, so that classes could make themselves efficiently sortable. That was not deemed generally useful enough to be worth it. In this case, I think numpy arrays are a good example to think about (as supposed to range objects :-) ) Numpy arrays can certainly find their max and min more efficiently than the generic functions, and are “proper” sequences that can be used in generic code. But they also have a particular signature for max and min (axis parameter) so really don’t map well to a dunder. They also have their own ways of doing all sorts of things that are different than a “typical” iterarable. I expect that is the case for other special data structures like trees, etc. So I don’t think this rises to the level of generally universal enough for a magic method. -CHB
participants (10)
-
Chris Barker - NOAA Federal
-
Franklin? Lee
-
Guido van Rossum
-
Jacco van Dorp
-
James Edwards
-
Michael Selik
-
Pål Grønås Drange
-
Richard Damon
-
Serhiy Storchaka
-
Steven D'Aprano