Mailman 3 "Exposing" `__min__` and `__max__` - Python-ideas

newer
Re: [Python-ideas] Delivery Status...

"Exposing" `min` and `max`

James Edwards

20 Jun 2018 20 Jun '18

12:48 a.m.

I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency. For example: # A "self-sorted" list object class AlwaysSortedListObejct: def __min__(self): return self.lst[0] def __max__(self): return self.lst[-1] # An object that maintains indices of extrema (e.g. for complex comparisons) class KeepsTrackOfExtrema: def __init__(self): self.min_index = None self.max_index = None def append(self, obj): new_index = len(obj) self.backer.append(obj) if (self.max_index is None) or (obj > self.backer[self.max_index]): self.max_index = new_index if (self.min_index is None) or (obj < self.backer[self.min_index]): self.min_index = new_index def __min__(self): return self.backer[self.min_index] def __max__(self): return self.backer[self.max_index] Where these methods be called via the single-argument calls to `max(obj)` and `min(obj)`. If it's not clear, it'd be similar to the way __len__ is called (when defined) via len(obj). My solution was to implement a .min() method, but that caused some ugly special casing when the object could also be a regular list (where I'd want to iterate over all of the items). I searched the list, but has this been discussed before? Is there any merit in it?

Attachments:

attachment.htm (text/html — 3.1 KB)

Show replies by date

Michael Selik

20 Jun 20 Jun

1:03 a.m.

Do you mind sharing an example usage in a realistic context? There might be a good solution that doesn't require adding magic methods. On Tue, Jun 19, 2018, 12:24 PM James Edwards wrote:

...

I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency. For example:

# A "self-sorted" list object class AlwaysSortedListObejct: def __min__(self): return self.lst[0] def __max__(self): return self.lst[-1]

# An object that maintains indices of extrema (e.g. for complex comparisons) class KeepsTrackOfExtrema: def __init__(self): self.min_index = None self.max_index = None

def append(self, obj): new_index = len(obj) self.backer.append(obj)

if (self.max_index is None) or (obj > self.backer[self.max_index]): self.max_index = new_index

if (self.min_index is None) or (obj < self.backer[self.min_index]): self.min_index = new_index

def __min__(self): return self.backer[self.min_index] def __max__(self): return self.backer[self.max_index]

Where these methods be called via the single-argument calls to `max(obj)` and `min(obj)`.

If it's not clear, it'd be similar to the way __len__ is called (when defined) via len(obj).

My solution was to implement a .min() method, but that caused some ugly special casing when the object could also be a regular list (where I'd want to iterate over all of the items).

I searched the list, but has this been discussed before? Is there any merit in it?

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Pål Grønås Drange

2:55 a.m.

Well, numpy implements ndarray.min(). It would be very nice if min(np.array) worked as expected. Pål On 19 Jun 2018 21:33, "Michael Selik" wrote:

...

Do you mind sharing an example usage in a realistic context? There might be a good solution that doesn't require adding magic methods.

On Tue, Jun 19, 2018, 12:24 PM James Edwards wrote:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency. For example:

# A "self-sorted" list object class AlwaysSortedListObejct: def __min__(self): return self.lst[0] def __max__(self): return self.lst[-1]

# An object that maintains indices of extrema (e.g. for complex comparisons) class KeepsTrackOfExtrema: def __init__(self): self.min_index = None self.max_index = None

def append(self, obj): new_index = len(obj) self.backer.append(obj)

if (self.max_index is None) or (obj > self.backer[self.max_index]): self.max_index = new_index

if (self.min_index is None) or (obj < self.backer[self.min_index]): self.min_index = new_index

def __min__(self): return self.backer[self.min_index] def __max__(self): return self.backer[self.max_index]

Where these methods be called via the single-argument calls to `max(obj)` and `min(obj)`.

If it's not clear, it'd be similar to the way __len__ is called (when defined) via len(obj).

My solution was to implement a .min() method, but that caused some ugly special casing when the object could also be a regular list (where I'd want to iterate over all of the items).

I searched the list, but has this been discussed before? Is there any merit in it?

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Steven D'Aprano

8:49 a.m.

On Tue, Jun 19, 2018 at 12:33:15PM -0700, Michael Selik wrote:

...

Do you mind sharing an example usage in a realistic context? There might be a good solution that doesn't require adding magic methods.

You have some sort of binary search tree that is iterated over in some arbitrary order. Calling min(tree) iterates over the entire tree, even if the tree knows how to find the minimum much more efficiently. Iterating over the entire tree is O(N), where N = number of nodes. More efficent min is typically O(D), where D = depth, which is typically about log_2 N if the tree is balanced. I know that for many purposes, we use dicts as they are more convenient and easier to use than trees, but there are still plenty of realistic use-cases for trees. -- Steve

Serhiy Storchaka

9:35 a.m.

19.06.18 22:18, James Edwards пише:

...

I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions. 1. What to do with additional min() and max() arguments: key and default. 2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases? If supporting of arguments key and default are not needed for you, you can implement your own functions and use them instead of min() and max().

Steven D'Aprano

12:30 p.m.

On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote:

...

19.06.18 22:18, James Edwards пише:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions.

1. What to do with additional min() and max() arguments: key and default.

Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder: min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)

...

2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?

Reasonable questions, but I don't think that the cost of testing: if hasattr(type(obj), '__min__') # or equivalent is going to be very large. Amortized over O(N) comparisons, that's practically free :-) More important, I think, is the increase in API complexity. That's two more dunders to learn about. The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not. I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own? Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases. -- Steve

James Edwards

5:53 p.m.

...

Are there any builtins or std library classes that offer their own min()/max() methods?

My first instinct was heapq[1], since the way to access the min value is simply heap[0] (and I thought it could benefit from __min__) -- it's almost the perfect motivating example. But as it stands, the module uses functions to operate directly on a standard list, so even if __min__ were exposed, min(heap) would still iterate over the entire list. That being said, a heap *class* could take advantage of this, and provide a semantically consistent optimization. I'm not sure how many examples will be found in stdlib, as I expect this optimization to be restricted to specialized container types like heaps, but I'll keep searching. [1] https://docs.python.org/3.6/library/heapq.html On Wed, Jun 20, 2018 at 3:00 AM, Steven D'Aprano wrote:

...

...
19.06.18 22:18, James Edwards пише:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could

On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote: think

...
...
of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions.

1. What to do with additional min() and max() arguments: key and default.

Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:

min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)

...
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?

Reasonable questions, but I don't think that the cost of testing:

if hasattr(type(obj), '__min__') # or equivalent

is going to be very large. Amortized over O(N) comparisons, that's practically free :-)

More important, I think, is the increase in API complexity. That's two more dunders to learn about.

The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not.

I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own?

Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Guido van Rossum

8:54 p.m.

Finding more realistic use cases is key -- the actual spec is pretty obvious and doesn't worry me in terms of added language or implementation complexity. I think just finding a data structure that should implement its own min/max funtionality (or maybe one of these, like heapq) is not enough motivation. You have to find code where such a data structure (let's say a Tree) is passed to some function that also accepts, say, a list. Then that function would benefit from being able to call just `min(x)` rather than `x.min() if isinstance(x, Tree) else min(x)`. If whenever you have a Tree you know that you have a Tree (because it has other unique methods) then there's no burden for the user to call x.min(). --Guido On Wed, Jun 20, 2018 at 5:24 AM James Edwards wrote:

...

...
Are there any builtins or std library classes that offer their own min()/max() methods?

My first instinct was heapq[1], since the way to access the min value is simply heap[0] (and I thought it could benefit from __min__) -- it's almost the perfect motivating example. But as it stands, the module uses functions to operate directly on a standard list, so even if __min__ were exposed, min(heap) would still iterate over the entire list.

That being said, a heap *class* could take advantage of this, and provide a semantically consistent optimization.

I'm not sure how many examples will be found in stdlib, as I expect this optimization to be restricted to specialized container types like heaps, but I'll keep searching.

[1] https://docs.python.org/3.6/library/heapq.html

On Wed, Jun 20, 2018 at 3:00 AM, Steven D'Aprano wrote:

...
...
19.06.18 22:18, James Edwards пише:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could

On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote: think

...
...
of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions.

1. What to do with additional min() and max() arguments: key and default.

Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:

min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)

...
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?

Reasonable questions, but I don't think that the cost of testing:

if hasattr(type(obj), '__min__') # or equivalent

is going to be very large. Amortized over O(N) comparisons, that's practically free :-)

More important, I think, is the increase in API complexity. That's two more dunders to learn about.

The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not.

I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own?

Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

James Edwards

10:23 p.m.

I do think there is some merit in being able to as easily as possible transition from something being, say list-backed, to a domain specific data structure that's more efficient. If one wanted to go from a list to a minheap, it'd be nice just to change `x = []` to `x = Heap()` and have everything else "just work", without having to change all `min(x)`s to `x.min()`s, etc. But trying to find places in production code where there is the special casing you described is (now) on the top of my priorities. Outside of invariant-maintaining data structures though, I think the most compelling case I've come up with so far is that it could be implemented on certain types of generators, without having to generate the whole sequence. (IMHO, they're neat side effects, but not very compelling by themselves): - __min__ and __max__ could be implemented on the range object, as calculating these are straightforward, and you wouldn't need to generate the entire sequence. Passing an "un-listified" range to a function that accepts lists is somewhat common, I think. The inefficiency is compounded if you generate the sequence multiple times, e.g.: def get_err(iterable): return max(iterable) - min(iterable) x = range(10000) get_err(x) # vs x = list(range(10000)) get_err(x) Here, implementing __min__ and __max__ would make passing a range not just as fast as passing a "listified" range, but significantly faster. But I think this is just a nice coincidence of exposing the special methods and by no means motivating by itself. - There are also a few infinite generators in itertools where calling min() or max() on the generator will run forever, despite at least one of these being a clearly defined: from itertools import count, cycle gen = count(start=0, step=1) # min=0, no max # or gen = cycle([1,2,3]) # min=1, max=3 print(min(gen)) # Will never terminate I'm even less sure about how often this would actually help than I am the range example. I don't envision many places where people are passing infinite generators to things expecting standard lists -- especially in light of the fact that calling min() or max() on them will prevent further execution. Thanks for all the feedback, the search will continue :) On Wed, Jun 20, 2018 at 11:24 AM, Guido van Rossum wrote:

...

Finding more realistic use cases is key -- the actual spec is pretty obvious and doesn't worry me in terms of added language or implementation complexity.

I think just finding a data structure that should implement its own min/max funtionality (or maybe one of these, like heapq) is not enough motivation. You have to find code where such a data structure (let's say a Tree) is passed to some function that also accepts, say, a list. Then that function would benefit from being able to call just `min(x)` rather than `x.min() if isinstance(x, Tree) else min(x)`. If whenever you have a Tree you know that you have a Tree (because it has other unique methods) then there's no burden for the user to call x.min().

--Guido

On Wed, Jun 20, 2018 at 5:24 AM James Edwards wrote:

...
...
Are there any builtins or std library classes that offer their own min()/max() methods?

My first instinct was heapq[1], since the way to access the min value is simply heap[0] (and I thought it could benefit from __min__) -- it's almost the perfect motivating example. But as it stands, the module uses functions to operate directly on a standard list, so even if __min__ were exposed, min(heap) would still iterate over the entire list.

That being said, a heap *class* could take advantage of this, and provide a semantically consistent optimization.

I'm not sure how many examples will be found in stdlib, as I expect this optimization to be restricted to specialized container types like heaps, but I'll keep searching.

[1] https://docs.python.org/3.6/library/heapq.html

On Wed, Jun 20, 2018 at 3:00 AM, Steven D'Aprano wrote:

...
...
19.06.18 22:18, James Edwards пише:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could

On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote: think

...
...
of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions.

1. What to do with additional min() and max() arguments: key and default.

Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:

min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)

...
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?

Reasonable questions, but I don't think that the cost of testing:

if hasattr(type(obj), '__min__') # or equivalent

is going to be very large. Amortized over O(N) comparisons, that's practically free :-)

More important, I think, is the increase in API complexity. That's two more dunders to learn about.

The first part is critical: is this useful enough to justify two more dunders? I think the answer is a definite Maybe. Or perhaps Maybe Not.

I think that without at least one use-case in the standard library, perhaps we should hold off on this. Unless numpy arrays are important enough to justify this on their own?

Are there any builtins or std library classes that offer their own min()/max() methods? If so, that would be good evidence that making this a dunder-based protocol has stdlib use-cases.

-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

-- --Guido van Rossum (python.org/~guido)

Serhiy Storchaka

10:47 p.m.

20.06.18 10:00, Steven D'Aprano пише:

...

On Wed, Jun 20, 2018 at 07:05:19AM +0300, Serhiy Storchaka wrote:

...
1. What to do with additional min() and max() arguments: key and default.

Since there are no reflected versions of min/max, there is no trouble with extra arguments. Just pass them through to the dunder:

min(obj, key=x, default=y) => type(obj).__min__(key=x, default=y)

The devil is in details. And you will see this when try to implement min() and __min__(). 1) There is no default value for default. This makes handling it in Python code hard. 2) Two original examples don't work with the key function. You will need to add complex caches for supporting different key functions, and this will add new problems. In future we may add new parameters for min() and max(). This is not closed protocol as for len() or `+`.

...

...
2. Is the need of this feature large enough? Will the benefit for special cases exceed the drawback of increasing implementation complexity and slowing down common cases?

Reasonable questions, but I don't think that the cost of testing:

if hasattr(type(obj), '__min__') # or equivalent

is going to be very large. Amortized over O(N) comparisons, that's practically free :-)

N may be small. And I suppose that for most calls it may be <10 or even <5. Note that the cost will be much larger than for __len__ or __add__, because new dunder methods will not have slots.

Franklin? Lee

26 Jun 26 Jun

9:04 p.m.

On Wed, Jun 20, 2018, 00:05 Serhiy Storchaka wrote:

...

19.06.18 22:18, James Edwards пише:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions.

1. What to do with additional min() and max() arguments: key and default.

Neither should be passed to a dunder. It is not possible to handle `key` without figuring out if a function is monotonic (a Turing-complete problem in general) or anti-monotonic (if that is a real term), so you MUST fall back on full iteration if a key is provided. `default` is only used in case of an empty collection. The only question is, who has responsibility for detecting an empty collection, and how? Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection? Dunder detects: Right now, `max` detects empty by watching for StopIteration, which can no longer be a false positive. StopIterations from a deeper scope are wrapped. If the dunder throws an error to signal emptiness, it should not be thrown otherwise. I think that's impossible to guarantee.

...

Franklin? Lee

27 Jun 27 Jun

3:19 a.m.

On Tue, Jun 26, 2018 at 11:34 AM, Franklin? Lee wrote:

...

On Wed, Jun 20, 2018, 00:05 Serhiy Storchaka wrote:

...
19.06.18 22:18, James Edwards пише:

...
I've only recently looked for these special methods, so that in and of itself may be the reason these methods aren't exposed, but I could think of objects that may wish to implement __min__ and __max__ themselves, for efficiency.

There are two questions.

1. What to do with additional min() and max() arguments: key and default.

Neither should be passed to a dunder.

It is not possible to handle `key` without figuring out if a function is monotonic (a Turing-complete problem in general) or anti-monotonic (if that is a real term), so you MUST fall back on full iteration if a key is provided.

`default` is only used in case of an empty collection. The only question is, who has responsibility for detecting an empty collection, and how?

Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?

Dunder detects: Right now, `max` detects empty by watching for StopIteration, which can no longer be a false positive. StopIterations from a deeper scope are wrapped. If the dunder throws an error to signal emptiness, it should not be thrown otherwise. I think that's impossible to guarantee.

There's an argument that you DO want to pass to the dunder: `last=True`. It's not currently part of `min` and `max`. Currently, if there are multiple items that are maximum, `max` will return the first one. In the future, a `last:bool` param could be added, and a dunder for `max` will want to handle it.

Richard Damon

6 a.m.

On 6/26/18 11:34 AM, Franklin? Lee wrote:

...

It is not possible to handle `key` without figuring out if a function is monotonic (a Turing-complete problem in general) or anti-monotonic (if that is a real term), so you MUST fall back on full iteration if a key is provided.

Monotonic (in this sense) just means never changing directions, it can be increasing and never decreasing or decreasing and never increasing so we don't need the 'anti-' version. -- Richard Damon

Jacco van Dorp

12:13 p.m.

2018-06-26 17:34 GMT+02:00 Franklin? Lee :

...

Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?

Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.). However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.

Michael Selik

1:06 p.m.

On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp wrote:

...

...
Caller detects: The caller checks length before calling the dunder. If

2018-06-26 17:34 GMT+02:00 Franklin? Lee : there

...
is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?

Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).

However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.

Have you ever written ``max(range(x))`` in production code?

...

Jacco van Dorp

1:29 p.m.

2018-06-27 9:36 GMT+02:00 Michael Selik :

...

On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp wrote:

...
2018-06-26 17:34 GMT+02:00 Franklin? Lee :

...
Caller detects: The caller checks length before calling the dunder. If there is no dunder, it doesn't check. Are there real-world cases where length is not defined on an iterable collection?

Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).

However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.

Have you ever written ``max(range(x))`` in production code?

Have you ever written len(range(x)) in production code ?

Steven D'Aprano

3:31 p.m.

On Wed, Jun 27, 2018 at 12:36:12AM -0700, Michael Selik wrote:

...

On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp wrote:

...

...
Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).

range is not a generator.

...

...
However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.

Have you ever written ``max(range(x))`` in production code?

I have never written that. But I have written ``max(iterable)`` dozens of times, where iterable could be a range object. -- Steve

Michael Selik

7:22 p.m.

On Wed, Jun 27, 2018, 3:06 AM Steven D'Aprano wrote:

...

On Wed, Jun 27, 2018 at 12:36:12AM -0700, Michael Selik wrote:

...
On Tue, Jun 26, 2018, 11:43 PM Jacco van Dorp wrote:

...
...
Generators dont have a __len__ method. And they might have min/max that can be calculated without iterating over the entire thing. The builtin range() is an example. (but also an exception, since it does have a __len__ attribute. This is specifically part of range and not generators in general, though.).

range is not a generator.

...
...
However, range() is an example where the dunders could be valuable - max(range(1e7)) already takes noticable time here, while it's rather easy to figure it out from start stop and step, just like len now does for it.

Have you ever written ``max(range(x))`` in production code?

I have never written that.

But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.

My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

...

Steven D'Aprano

7:57 p.m.

On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:

...

...
...
Have you ever written ``max(range(x))`` in production code?

I have never written that.

But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.

My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you. If you control your own input, then this doesn't sound too interesting. You know when you will get a list, and you can call max() on it, and you know when you are passing yourself a tree, and you can give your own tree a max() method and call that. But if you don't control your input, then this is a good way to delegate back to the unknown object of an unknown class which knows itself. Currently max() and min() have no choice but to walk the entire data structure, even if the object already knows its own maximum and minimum values. That's wasteful. An analogy: in Ruby, the equivalent of the len() built-in falls back on iteration as a last resort for any object which doesn't define a len method: size = 0 for x in obj: size += 1 return size By this analogy, max() and min() currently are like that last-resort version of len(), except they do it for *every* object that can be iterated over even when there's no need. Imagine that Python's len() always walked the entire iterable, from start to end, to count the length. Now suppose that you proposed adding a __len__ protocol so that objects that know their own length can report it quickly, and in response I argued that len(range(x)) was unrealistic and that there is no need for a __len__ method because we could just say range(x).stop instead. I don't think you would find that argument very persuasive, would you? -- Steve

Michael Selik

8:35 p.m.

On Wed, Jun 27, 2018 at 7:30 AM Steven D'Aprano wrote:

...

On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:

...
...
...
Have you ever written ``max(range(x))`` in production code? I have never written that. But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.

My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.

This is not a complete, realistic example. You're describing what an example might be, but not providing a concrete one with context. Quoting Guido from earlier in the thread: "I think just finding a data structure that should implement its own min/max funtionality (or maybe one of these, like heapq) is not enough motivation. You have to find code where such a data structure (let's say a Tree) is passed to some function that also accepts, say, a list." Imagine that Python's len() always walked the entire iterable, from

...

start to end, to count the length. Now suppose that you proposed

...

adding a __len__ protocol so that objects that know their own length can report it quickly, and in response I argued that len(range(x)) was unrealistic and that there is no need for a __len__ method because we could just say range(x).stop instead. I don't think you would find that argument very persuasive, would you?

I would, actually. The range object is particularly unpersuasive as a motivation for magic methods, because of its unusual usage. The hypothetical Tree object discussed earlier was much more interesting.

Franklin? Lee

8:46 p.m.

On Wed, Jun 27, 2018, 10:31 Steven D'Aprano wrote:

...

On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:

...
...
...
Have you ever written ``max(range(x))`` in production code?

I have never written that.

But I have written ``max(iterable)`` dozens of times, where iterable could be a range object.

My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.

If you control your own input, then this doesn't sound too interesting. You know when you will get a list, and you can call max() on it, and you know when you are passing yourself a tree, and you can give your own tree a max() method and call that.

But if you don't control your input, then this is a good way to delegate back to the unknown object of an unknown class which knows itself.

Currently max() and min() have no choice but to walk the entire data structure, even if the object already knows its own maximum and minimum values. That's wasteful.

An analogy: in Ruby, the equivalent of the len() built-in falls back on iteration as a last resort for any object which doesn't define a len method:

size = 0 for x in obj: size += 1 return size

By this analogy, max() and min() currently are like that last-resort version of len(), except they do it for *every* object that can be iterated over even when there's no need.

Imagine that Python's len() always walked the entire iterable, from start to end, to count the length. Now suppose that you proposed adding a __len__ protocol so that objects that know their own length can report it quickly, and in response I argued that

len(range(x))

was unrealistic and that there is no need for a __len__ method because we could just say

range(x).stop

instead. I don't think you would find that argument very persuasive, would you?

Let's just assume Michael wants to know, and isn't making an argument against the proposal. `range` is more a sequence (a collection) than a generator. It's not Python 2's xrange. You can iterate through one multiple times. If `range` did not have a length, we would eventually propose that it should. I can't think of a real iterable which has no length but knows what its max is. An iterator or generator can't generally know its max without consuming everything. I have some vague thoughts about collections which have known bounds but not known sizes, but nothing concrete. It's bikeshedding, anyway. How isn't as important as whether it's useful. By the way, range(1e7) fails with a type error.

...

Michael Selik

9 p.m.

On Wed, Jun 27, 2018 at 8:16 AM Franklin? Lee wrote:

...

On Wed, Jun 27, 2018, 10:31 Steven D'Aprano wrote:

...
On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:

...
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.

Let's just assume Michael wants to know, and isn't making an argument against the proposal.

I do want to know, but it's also an argument against the proposal -- that no one has contributed in-context usage to demonstrate the value. I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``. Chris Barker explained the issue well.

Franklin? Lee

10:15 p.m.

On Wed, Jun 27, 2018 at 11:30 AM, Michael Selik wrote:

...

On Wed, Jun 27, 2018 at 8:16 AM Franklin? Lee wrote:

...
On Wed, Jun 27, 2018, 10:31 Steven D'Aprano wrote:

...
On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:

...
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.

Let's just assume Michael wants to know, and isn't making an argument against the proposal.

I do want to know, but it's also an argument against the proposal -- that no one has contributed in-context usage to demonstrate the value. I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``. Chris Barker explained the issue well.

Then maybe you shouldn't have picked on the verbatim case of `len(range(...))`. You won't find many examples deciding between `max(x)` and `x.max()`. `np.nanmax(x)` will optimize the case where you care about efficiency. (By the way, Numpy checks for `x.max` if `x` isn't an `ndarray`, so it already kind of implements the proposal.) I honestly can't imagine a case where you need to know the max of some possibly-unsorted collection, but don't eventually iterate through the whole thing anyway (no difference in asymptotic time). You might have some savings sometimes, but the overall gain is unpredictable, and shouldn't be depended on. A case more realistic to me is when you know it's definitely a collection that knows its max/min, but you don't know what kind of collection it is. But a sorted collection is pretty much a sequence, and you can usually get the first or last element using c[0] and c[-1], unless it's a SortedDict. A heap or a Young tableau knows its min, and only has to search its leaves/boundary for its max (or vice versa). For a sorted collection, you may want to check against min and max before inserting, deleting, or searching for an element. However, these checks can be implemented by the functions doing the insert/delete/search. I expect existing uses for dunder max will appear in generic algorithm libraries, rather than in concrete uses. If you're switching to a sorted collection, it suggests that you want efficiency, and I imagine you'll modify your code to fit the new data structure.

James Edwards

28 Jun 28 Jun

7:15 a.m.

...

I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``.

I understand that, and I've spent a few hours searching github with less-than-stellar results due to github's search syntax ignoring '.' and '(' https://help.github.com/articles/searching-code/#considerations-for-code-sea.... (That being said, there are a number of projects https://github.com/search?l=Python&q=__min__&type=Code on github that seem to expect __min__ to work (example https://github.com/windowssocket/algorithm/blob/d731dca2917b6d6073a49d53cf43...), but it's true that some are using the dunder for their own purposes.). However, there are many modules that implement objects that could make good use of exposing __min__ (e.g. bintrees https://pypi.org/project/bintrees/, sortedcontainers http://www.grantjenks.com/docs/sortedcontainers/). bintrees provides `min_item()` and `max_item()` accessors. sortedcontainers doesn't seem to explicitly provide similar methods, but maintain the sortedness of the objects so min and max can be accessed via index. And I can't stress enough the value in being able to switch to one of these classes from a standard iterable and have the rest of your code (e.g. `min()`s "just work"). Also, the desire for custom implementations is there https://stackoverflow.com/questions/45308212/is-there-a-way-to-return-a-cust..., but the proposed solutions rely on (IMO) ugly hacks that would break other things. The search for the perfect `x.min() if isinstance(...) else min(x)` still continues, however. On Wed, Jun 27, 2018 at 11:30 AM, Michael Selik wrote:

...

On Wed, Jun 27, 2018 at 8:16 AM Franklin? Lee < leewangzhong+python@gmail.com> wrote:

...
On Wed, Jun 27, 2018, 10:31 Steven D'Aprano wrote:

...
On Wed, Jun 27, 2018 at 06:52:14AM -0700, Michael Selik wrote:

...
My intent was to ask where a range was in fact passed into max, not merely where it could be. It'd be enlightening to see a complete, realistic example.

A complete, realistic example is as I said: you call max() on some object which you don't control, the caller does. You could be passed a list, or a set, or a bitset, a binary search tree, a range object, whatever the caller happens to pass to you.

Let's just assume Michael wants to know, and isn't making an argument against the proposal.

I do want to know, but it's also an argument against the proposal -- that no one has contributed in-context usage to demonstrate the value. I'd want to see code that currently uses ``if isinstance`` to switch between ``max(x)`` and ``x.max()``. Chris Barker explained the issue well.

_______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/

Chris Barker - NOAA Federal

27 Jun 27 Jun

8:53 p.m.

I don’t think anyone would argue that there would be use cases for __max__ and __min__ special methods. However, there is substantial overhead to adding new magic methods, so the question is not whether it would be useful in some special cases, but whether it would be useful enough in common enough cases to be worth the overhead. For example, we had a discussion on this list about adding a __sort_key__ magic method, so that classes could make themselves efficiently sortable. That was not deemed generally useful enough to be worth it. In this case, I think numpy arrays are a good example to think about (as supposed to range objects :-) ) Numpy arrays can certainly find their max and min more efficiently than the generic functions, and are “proper” sequences that can be used in generic code. But they also have a particular signature for max and min (axis parameter) so really don’t map well to a dunder. They also have their own ways of doing all sorts of things that are different than a “typical” iterarable. I expect that is the case for other special data structures like trees, etc. So I don’t think this rises to the level of generally universal enough for a magic method. -CHB

2122

Age (days ago)

2131

Last active (days ago)

List overview

Download

24 comments

10 participants

participants (10)

Chris Barker - NOAA Federal
Franklin? Lee
Guido van Rossum
Jacco van Dorp
James Edwards
Michael Selik
Pål Grønås Drange
Richard Damon
Serhiy Storchaka
Steven D'Aprano

"Exposing" `__min__` and `__max__`

Jacco van Dorp

Jacco van Dorp

tags

participants (10)

"Exposing" `min` and `max`