Add builtin function for min(max())
Even after years of coding it sometimes takes me a moment to correctly parse expressions like `min(max(value, minimum), maximum)`, especially when the parentheses enclose some local computation instead of only references, and the fact that `min` actually needs the *maximum* value as an argument (and vice-versa) certainly doesn't help. It's a little surprising to me how shorthand functions akin to CSS's `clamp()` aren't more popular among modern programming languages. Such a tool is, in my opinion, even more missed in Python, what with it being focussed on readability and all. There would likely also be some (probably minor) performance advantages with implementing it at a builtins level. Example usage:
val = 100 clamp(10, val, 50) 50 val = 3 clamp(10, val, 50) 10 val = 25 clamp(10, val, 50) 25
I'm undecided whether I would like `clamp`, `minmax` or something else as a name. I'm curious of other ideas. As far as the signature is concerned, I like both `clamp(min, val, max)` for the logical position of the three arguments (which mirrors expressions such as `min < val < max`) and `clamp(val, min=x, max=y)`. I prefer the former, but declaring them as normal positional-and-keyword arguments would allow the programmer to use an alternative order if they so choose.
My vote: clamp(+1000, min=-1, max=+1) I've had to do this operation many times in a variety of languages (Python is my playtoy for personal stuff, but work and school rarely give me a choice of language). I always love it when the language has a readily available `clamp()` function for this, and quietly grumble when I have to build it myself (usually as two lines to avoid the readability issue of nested function calls). It's one of those minor annoyances that you don't realize how annoying it is until you work in a language that hands you a simple `clamp` function for free (or at the cost of one import), and then you wonder "why doesn't <my favorite language> have that?" It's simple enough IMO to go straight into the stdlib, possibly even builtins. The maintenance burden of such a simple function would be almost zero, and I doubt there are a lot of third-party uses of the name `clamp` that aren't implementing this operation as a utility function (and that would also care about having access to the new builtin; after all, existing uses would continue to work and just shadow the new builtin). If we want to bikeshed now, I prefer the signature I used above for my vote. More specifically: clamp(val, /, min, max). Alternatively, clamp(val, min, max, /) is also fine by me. On Fri, Jul 3, 2020 at 5:17 PM Federico Salerno <salernof11@gmail.com> wrote:
Even after years of coding it sometimes takes me a moment to correctly parse expressions like `min(max(value, minimum), maximum)`, especially when the parentheses enclose some local computation instead of only references, and the fact that `min` actually needs the *maximum* value as an argument (and vice-versa) certainly doesn't help.
It's a little surprising to me how shorthand functions akin to CSS's `clamp()` aren't more popular among modern programming languages. Such a tool is, in my opinion, even more missed in Python, what with it being focussed on readability and all. There would likely also be some (probably minor) performance advantages with implementing it at a builtins level.
Example usage:
val = 100 clamp(10, val, 50) 50 val = 3 clamp(10, val, 50) 10 val = 25 clamp(10, val, 50) 25
I'm undecided whether I would like `clamp`, `minmax` or something else as a name. I'm curious of other ideas.
As far as the signature is concerned, I like both `clamp(min, val, max)` for the logical position of the three arguments (which mirrors expressions such as `min < val < max`) and `clamp(val, min=x, max=y)`. I prefer the former, but declaring them as normal positional-and-keyword arguments would allow the programmer to use an alternative order if they so choose. _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KWAOQF... Code of Conduct: http://python.org/psf/codeofconduct/
On 2020-07-03 6:05 p.m., Federico Salerno wrote:
Even after years of coding it sometimes takes me a moment to correctly parse expressions like `min(max(value, minimum), maximum)`, especially when the parentheses enclose some local computation instead of only references, and the fact that `min` actually needs the *maximum* value as an argument (and vice-versa) certainly doesn't help.
It's a little surprising to me how shorthand functions akin to CSS's `clamp()` aren't more popular among modern programming languages. Such a tool is, in my opinion, even more missed in Python, what with it being focussed on readability and all. There would likely also be some (probably minor) performance advantages with implementing it at a builtins level.
Example usage:
val = 100 clamp(10, val, 50) 50 val = 3 clamp(10, val, 50) 10 val = 25 clamp(10, val, 50) 25
I'm undecided whether I would like `clamp`, `minmax` or something else as a name. I'm curious of other ideas.
As far as the signature is concerned, I like both `clamp(min, val, max)` for the logical position of the three arguments (which mirrors expressions such as `min < val < max`) and `clamp(val, min=x, max=y)`. I prefer the former, but declaring them as normal positional-and-keyword arguments would allow the programmer to use an alternative order if they so choose.
I'd go for val[min:max] tbh. benefits: - currently not allowed! - replaces min *and* max!
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/KWAOQF... Code of Conduct: http://python.org/psf/codeofconduct/
I'd go for val[min:max] tbh. benefits:
- currently not allowed! - replaces min and max!
Is this a serious suggestion? No offence intended, but this seems ill-thought-out. val[min:max] is perfectly legal syntax and it will only error if the variable val happens to not support indexing. This seems like it would break numpy arrays (which, IIRC, support both slicing and arithmetic operations), or, at the very least, be ambiguous on classes which choose to support both indexing and ordering (and thus clamping)
On 2020-07-03 7:53 p.m., TCPhone93@gmail.com wrote:
I'd go for val[min:max] tbh. benefits:
- currently not allowed! - replaces min and max!
Is this a serious suggestion? No offence intended, but this seems ill-thought-out. val[min:max] is perfectly legal syntax and it will only error if the variable val happens to not support indexing. This seems like it would break numpy arrays (which, IIRC, support both slicing and arithmetic operations), or, at the very least, be ambiguous on classes which choose to support both indexing and ordering (and thus clamping)
how do you plan to clamp a numpy array or a string?
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/ASOXQV... Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Jul 03, 2020 at 09:24:47PM -0300, Soni L. wrote:
how do you plan to clamp a numpy array or a string?
I'm not saying it is meaningful, but it certainly works to clamp strings: py> s = "hello" py> min(max(s, "a"), "z") 'hello' Likewise other indexable types can be compared with min and max: py> min(['a', 1], ['b', 2]) ['a', 1] The traditional meaning of slice notation is to take a slice of a sequence, i.e. to extract a sub-sequence. I don't see the conceptual connection between "take a sub-sequence" and "clamp to within some bounds", and if I saw something like this: (45)[2:18] I would interpret it as some form of bit-masking, i.e. extracting bits 2 to 18 in some form or another. I'd certainly never guess in a million years that it was a clamping operation. -- Steven
On Fri, Jul 3, 2020 at 5:25 PM <TCPhone93@gmail.com> wrote:
I'd go for val[min:max] tbh.
another reason this is Not Good: in slicing syntax, a:b means >=a and < b -- this asymmetry is not what we would want here. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
FWIW, numpy calls it "clip": numpy.clip(a, a_min, a_max, out=None, **kwargs) Clip (limit) the values in an array. Given an interval, values outside the interval are clipped to the interval edges. For example, if an interval of [0, 1] is specified, values smaller than 0 become 0, and values larger than 1 become 1. Equivalent to but faster than np.minimum(a_max, np.maximum(a, a_min)). No check is performed to ensure a_min < a_max.-CHB On Fri, Jul 3, 2020 at 5:37 PM Christopher Barker <pythonchb@gmail.com> wrote:
On Fri, Jul 3, 2020 at 5:25 PM <TCPhone93@gmail.com> wrote:
I'd go for val[min:max] tbh.
another reason this is Not Good: in slicing syntax, a:b means >=a and < b -- this asymmetry is not what we would want here.
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On 2020-07-03 9:37 p.m., Christopher Barker wrote:
On Fri, Jul 3, 2020 at 5:25 PM <TCPhone93@gmail.com <mailto:TCPhone93@gmail.com>> wrote:
> I'd go for val[min:max] tbh.
another reason this is Not Good: in slicing syntax, a:b means >=a and < b -- this asymmetry is not what we would want here.
It doesn't make a difference tho, does it?
-CHB
-- Christopher Barker, PhD
Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/UNOBF4... Code of Conduct: http://python.org/psf/codeofconduct/
On 2020-07-03 22:56, Soni L. wrote:
On 2020-07-03 6:05 p.m., Federico Salerno wrote:
Even after years of coding it sometimes takes me a moment to correctly parse expressions like `min(max(value, minimum), maximum)`, especially when the parentheses enclose some local computation instead of only references, and the fact that `min` actually needs the *maximum* value as an argument (and vice-versa) certainly doesn't help.
It's a little surprising to me how shorthand functions akin to CSS's `clamp()` aren't more popular among modern programming languages. Such a tool is, in my opinion, even more missed in Python, what with it being focussed on readability and all. There would likely also be some (probably minor) performance advantages with implementing it at a builtins level.
Example usage:
val = 100 clamp(10, val, 50) 50 val = 3 clamp(10, val, 50) 10 val = 25 clamp(10, val, 50) 25
I'm undecided whether I would like `clamp`, `minmax` or something else as a name. I'm curious of other ideas.
As far as the signature is concerned, I like both `clamp(min, val, max)` for the logical position of the three arguments (which mirrors expressions such as `min < val < max`) and `clamp(val, min=x, max=y)`. I prefer the former, but declaring them as normal positional-and-keyword arguments would allow the programmer to use an alternative order if they so choose.
I'd go for val[min:max] tbh.
benefits:
- currently not allowed! - replaces min *and* max!
Disadvantage: it looks like subscripting, and you're assuming that you'll only ever be working with (unscriptable) scalars.
On Fri, Jul 3, 2020 at 5:57 PM Soni L. <fakedme+py@gmail.com> wrote:
On 2020-07-03 6:05 p.m., Federico Salerno wrote:
Even after years of coding it sometimes takes me a moment to correctly parse expressions like `min(max(value, minimum), maximum)`, especially when the parentheses enclose some local computation instead of only references, and the fact that `min` actually needs the *maximum* value as an argument (and vice-versa) certainly doesn't help.
It's a little surprising to me how shorthand functions akin to CSS's `clamp()` aren't more popular among modern programming languages. Such a tool is, in my opinion, even more missed in Python, what with it being focussed on readability and all. There would likely also be some (probably minor) performance advantages with implementing it at a builtins level.
Example usage:
val = 100 clamp(10, val, 50) 50 val = 3 clamp(10, val, 50) 10 val = 25 clamp(10, val, 50) 25
I'm undecided whether I would like `clamp`, `minmax` or something else as a name. I'm curious of other ideas.
As far as the signature is concerned, I like both `clamp(min, val, max)` for the logical position of the three arguments (which mirrors expressions such as `min < val < max`) and `clamp(val, min=x, max=y)`. I prefer the former, but declaring them as normal positional-and-keyword arguments would allow the programmer to use an alternative order if they so choose.
I'd go for val[min:max] tbh.
If I were to see this, even if I didn't confuse it with slicing syntax, I'd intuitively think the resulting value is `min <= value < max`, given the typical meaning of min/max in the slicing syntax.
On 2020-07-03 22:05, Federico Salerno wrote:
Even after years of coding it sometimes takes me a moment to correctly parse expressions like `min(max(value, minimum), maximum)`, especially when the parentheses enclose some local computation instead of only references, and the fact that `min` actually needs the *maximum* value as an argument (and vice-versa) certainly doesn't help.
It's a little surprising to me how shorthand functions akin to CSS's `clamp()` aren't more popular among modern programming languages. Such a tool is, in my opinion, even more missed in Python, what with it being focussed on readability and all. There would likely also be some (probably minor) performance advantages with implementing it at a builtins level.
Example usage:
val = 100 clamp(10, val, 50) 50 val = 3 clamp(10, val, 50) 10 val = 25 clamp(10, val, 50) 25
I'm undecided whether I would like `clamp`, `minmax` or something else as a name. I'm curious of other ideas.
As far as the signature is concerned, I like both `clamp(min, val, max)` for the logical position of the three arguments (which mirrors expressions such as `min < val < max`) and `clamp(val, min=x, max=y)`. I prefer the former, but declaring them as normal positional-and-keyword arguments would allow the programmer to use an alternative order if they so choose.
Should it raise an exception if minimum > maximum? If it doesn't, then you'll get a different answer depending on whether it's `min(max(value, minimum), maximum)` or `max(min(value, maximum), minimum)`.
On 04/07/2020 02:01, MRAB wrote:
Should it raise an exception if minimum > maximum? If it doesn't, then you'll get a different answer depending on whether it's `min(max(value, minimum), maximum)` or `max(min(value, maximum), minimum)`.
Yes, I'd expect ValueError if min > max or max < min. On 04/07/2020 02:03, Steven D'Aprano wrote:
Bottom line is that passing a NAN as the lower or upper bound should treat it as equivalent to "unbounded", that is, equivalent to ±∞. The beauty of that is that it can be implemented without explicitly testing for NANs, which involves unnecessary conversions to float, and may even raise an exception. Here is the version I use: ... Features: * uses None as a convenient alias for unbounded; * treats NANs according to the standard; * requires no explicit conversion to float or testing for NANs; * so this will work with Fractions and Decimals.
I'm not opposed to this but wouldn't the programmer expect it to behave much like a shorthand of the existing min() + max()? Should these two then be modified to exhibit the same behaviour? I'd find it inconsistent if clamp() did but min() and max() didn't. On 04/07/2020 02:50, Christopher Barker wrote:
FWIW, numpy calls it "clip" I feel clip fits best with the idea of a collection to... clip. `clamp()` would work with scalars, for which the word clip might not be clear at a glance. While I have no strong feelings in favour of clamp, I do think it would be better than clip (but maybe it's just my experience with CSS speaking). As far as other options go, I agree with Mr D'Aprano's objection to `minmax`, and I'd like to toss a possible `coerce` (similarly used in Kotlin) into the hat. Maybe someone has better names in mind?
On Saturday, July 4, 2020, at 03:16 -0500, Federico Salerno wrote:
FWIW, numpy calls it "clip" I feel clip fits best with the idea of a collection to... clip. `clamp()` would work with scalars, for which the word clip might not be clear at a glance. While I have no strong feelings in favour of clamp, I do think it would be better than clip (but maybe it's just my experience with CSS speaking). As far as other options go, I agree with Mr D'Aprano's objection to `minmax`, and I'd like to toss a
On 04/07/2020 02:50, Christopher Barker wrote: possible `coerce` (similarly used in Kotlin) into the hat. Maybe someone has better names in mind?
Coerce makes me think of types¹ rather than values. YMMV. ¹ https://en.wikipedia.org/wiki/Type_conversion -- “Whoever undertakes to set himself up as a judge of Truth and Knowledge is shipwrecked by the laughter of the gods.” – Albert Einstein Dan Sommers, http://www.tombstonezero.net/dan
On Sat, Jul 4, 2020 at 8:29 AM Federico Salerno <salernof11@gmail.com> wrote:
On 04/07/2020 02:50, Christopher Barker wrote:
FWIW, numpy calls it "clip"
I feel clip fits best with the idea of a collection to... clip. `clamp()` would work with scalars, for which the word clip might not be clear at a glance. While I have no strong feelings in favour of clamp, I do think it would be better than clip (but maybe it's just my experience with CSS speaking). As far as other options go, I agree with Mr D'Aprano's objection to `minmax`, and I'd like to toss a possible `coerce` (similarly used in Kotlin) into the hat. Maybe someone has better names in mind?
I'm going to back `clamp()` as that's what it's commonly referred to in game dev circles. It's one of those functions that everyone in game dev either memorizes the max(min()) pattern, writes their own, or is blessed with a framework that provides it. I agree with Dan that if I saw `coerce` I'd assume type coercion. One benefit to `minmax` is that for those who have a good autocomplete, it'll come up with either version of the `min(max())` pattern and suggest a better usage if that's what they're after. Largely, I think discoverability is a design constraint worth considering, and clamp and minmax both have benefits in that regard. (clamp being a common name for the scalar operation, and minmax for search similarity to the existing functions.) Piper Thunstrom My public key is available at https://keybase.io/pathunstrom Public key fingerprint: 8FF9 3F4E C447 55EC 4658 BDCC A57E A7A4 86D2 644F
Federico Salerno writes:
I feel clip fits best with the idea of a collection to... clip.
True, but you can (for this purpose) think of a scalar as a singleton. Still, I think "clamp" is by far the best of the bunch (though I don't see a need for this function in the stdlib, and definitely not a builtin). The problem with making this a builtin is that I don't think that "clamp" is an unlikely identifier. In particular, I guess clamping the volume of a track in audio processing is a common operation. "clip" is much worse, as it's used in all of audio, image, and video processing, and I can imagine it as a place for keeping deleted or copied objects. math.clamp wouldn't be totally objectionable. Which suggests the question: Is there a commonly used equivalent for complex numbers?
As far as other options go, I agree with Mr D'Aprano's objection to `minmax`,
Definitely out.
and I'd like to toss a possible `coerce`
Here my issue is that for me the *target* of a coercion should be a "single thing", which could be a type, but might also be a scalar. It is true that type theorists consider x in Reals and y in [0,1] to be different types, so "y = coerce(x) # to unit interval" could match that concept, but somehow that doesn't work for me. That may just be me, the majority of native speakers may disagree.
2QdxY4RzWzUUiLuE@potatochowder.com writes:
On 2020-07-05 at 12:18:54 +0900, "Stephen J. Turnbull" <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Which suggests the question: Is there a commonly used equivalent for complex numbers?
How would that work? Complex numbers are unordered, but I suspect that you know that.
Oh, that's not a problem. Impose one, and done. If you insist on two complex-parameter bounds, there's at least one interesting way to specify a total order with a connected "region" "between" any two complex numbers: lexicographic in (magnitude, argument). But my question was more "what's the use case?" So I'm not persuaded by thinking of confining the mouse pointer to a window whose points are represented by complex numbers. I was more wondering if, for example, it would be useful in working with electromagnetic waveforms, or such applications where for some reason the complex plane is more useful than the real one. But let's think bigger, much bigger. Really, clamp is a topological concept with a bounding *set*. What's not to love about clamp(z, Mandelbrot)? :-)
On Sun, Jul 05, 2020 at 12:18:54PM +0900, Stephen J. Turnbull wrote:
The problem with making this a builtin is that I don't think that "clamp" is an unlikely identifier.
Doesn't matter -- we can shadow builtins, it's just a name, not a keyword. Lots of code in the wild uses str, len, id, chr, etc as variable names. I think more importantly, clamp is not important enough to be a builtin. But I wouldn't object to it being builtin if others think it is (and you can convince the Steering Council).
Which suggests the question: Is there a commonly used equivalent for complex numbers?
Naturally :-) Complex numbers represent points on a plane; it is very common in graphical toolkits to need to clamp an object to within some window or other region of the plane, so that you don't (e.g.) drag your object outside of the document, or position it somewhere off screen where it is impossible for the user to click on. (There was, or is, an annoying bug in OpenOffice that would occasionally reposition the coordinates of objects to some ludicrous position way off screen where they couldn't be reached.) Now admittedly screen and window coordinates are generally integer values, but the concept applies to floating point coordinates too: if you have a need to clamp a point to within a rectangular region, applying clamp() to the real and imaginary components will do the job.
and I'd like to toss a possible `coerce`
Here my issue is that for me the *target* of a coercion should be a "single thing", which could be a type, but might also be a scalar. It is true that type theorists consider x in Reals and y in [0,1] to be different types, so "y = coerce(x) # to unit interval" could match that concept, but somehow that doesn't work for me. That may just be me, the majority of native speakers may disagree.
No, I agree. In computing, coerce nearly always means to coerce to a type, not to coerce to some range of values. There are two standard terms for this function: clamp and clip, depending on whether you view the operation as squashing the value into a range or cutting off the bits that don't fit. I prefer clamp but could live with clip. -- Steven
On 5/07/20 4:39 pm, Steven D'Aprano wrote:
Complex numbers represent points on a plane; it is very common in graphical toolkits to need to clamp an object to within some window or other region of the plane,
But graphical toolkits don't treat points as complex numbers. The question is whether there is a conventional generalisation of clamp() used in complex analysis. There isn't one that I know of.
so that you don't (e.g.) drag your object outside of the document, or position it somewhere off screen where it is impossible for the user to click on.
I wouldn't call this operation "clipping", though -- see below.
There are two standard terms for this function: clamp and clip, depending on whether you view the operation as squashing the value into a range or cutting off the bits that don't fit.
The notion of clipping in computer graphics is not really the same thing. If you're drawing a point clipped to a window, and it's outside the window, then you skip drawing it altogether, you don't move it to the boundary of the window. And if you're drawing a line that's partly inside and partly outside the window, you need to cut off the part that's outside -- but you can't do that by clamping the outside point to the window, you need to do something more complicated. Another data point: In OpenGL documentation, the operation of limiting colour values or texture coordinates to be within a certain range is called "clamping". All in all, I think "clamp" is the best term for this. -- Greg
On 05/07/2020 05:39, Steven D'Aprano wrote:
On Sun, Jul 05, 2020 at 12:18:54PM +0900, Stephen J. Turnbull wrote:
and I'd like to toss a possible `coerce`
Here my issue is that for me the *target* of a coercion should be a "single thing", which could be a type, but might also be a scalar. ... No, I agree. In computing, coerce nearly always means to coerce to a type, not to coerce to some range of values.
"bound", or probably "bounded" (for the same reason we have "sorted"). "clamp" and "clip" sound to me like things you do to a waveform (with Schottky diodes!), so it works for me but I'm not sure it travels well. Elsewhere in the thread (tree) we're already calling arg[1:2] "bounds", so reading this as "the value of x bound[ed] by the range -1 to +5" seems natural. Or "limited" possibly? I'm +0 on the idea FWIW. I also find it difficult to read, but I tend to in-line it as ifs, in part for clarity. Jeff Allen
On Sat, Jul 04, 2020 at 10:16:45AM +0200, Federico Salerno wrote:
Yes, I'd expect ValueError if min > max or max < min.
Is there a difference between those two conditions? *wink*
On 04/07/2020 02:03, Steven D'Aprano wrote:
Bottom line is that passing a NAN as the lower or upper bound should treat it as equivalent to "unbounded", that is, equivalent to ±∞. The beauty of that is that it can be implemented without explicitly testing for NANs, which involves unnecessary conversions to float, and may even raise an exception. Here is the version I use: ... Features: * uses None as a convenient alias for unbounded; * treats NANs according to the standard; * requires no explicit conversion to float or testing for NANs; * so this will work with Fractions and Decimals.
I'm not opposed to this but wouldn't the programmer expect it to behave much like a shorthand of the existing min() + max()?
For regular numeric numbers, it does. The only differences I can see are that my implementation of clamp() supports None as a short-hand for infinity; and that it treats NANs according to the standard, unlike the builtin min and max, which manage to provide the worst of both possible words: they treat NANs according to the order of the arguments, thus satisfying nobody and annoying everybody. The first part is, I think, important because with the min+max idiom, if one side is unbounded, you can just leave it out: min(x, 1000) # like clamp(x, -float('inf'), 1000) but with clamp you have to supply *something* to mean "unbounded", and using float("inf") is not very convenient. So it's just a tiny bit of sugar to make the function more useful. I've been using it for about four years, and it's nice to have. Having a short-cut for clamp is a good usability feature that costs very little (a couple of extra pointer comparisons to test for `is None`, which is cheap as it comes in Python).
Should these two then be modified to exhibit the same behaviour? I'd find it inconsistent if clamp() did but min() and max() didn't.
Perhaps you should reconsider your expectations there. They do different things because they are different functions with different signatures and different purposes. It isn't even necessary to use min and max in the implementation of clamp, in fact it is better not to. Any "consistency" arguments for clamp versus min/max are weak at best. -- Steven
On 2020-07-05 15:15, Steven D'Aprano wrote: [snip]
The only differences I can see are that my implementation of clamp() supports None as a short-hand for infinity; and that it treats NANs according to the standard, unlike the builtin min and max, which manage to provide the worst of both possible words: they treat NANs according to the order of the arguments, thus satisfying nobody and annoying everybody.
The first part is, I think, important because with the min+max idiom, if one side is unbounded, you can just leave it out:
min(x, 1000) # like clamp(x, -float('inf'), 1000)
but with clamp you have to supply *something* to mean "unbounded", and using float("inf") is not very convenient. So it's just a tiny bit of sugar to make the function more useful.
+1 to using None. [snip]
On 05/07/2020 16:15, Steven D'Aprano wrote:
Perhaps you should reconsider your expectations there. They do different things because they are different functions with different signatures and different purposes. It isn't even necessary to use min and max in the implementation of clamp, in fact it is better not to.
Any "consistency" arguments for clamp versus min/max are weak at best.
Point taken. I don't have a strong opinion on this but I'm curious if others would find it useful to have iterables as acceptable types for bounds. Or maybe even allowing a range instance to be supplied as bounds (I can't think of a use case where I couldn't just pass the bounds of the range instead, but maybe someone else can).
On Sat, Jul 04, 2020 at 01:01:15AM +0100, MRAB wrote:
Should it raise an exception if minimum > maximum?
I think there are only two reasonable answers to this: - raise an exception if the lower bounds is greater than the upper bounds ("errors should never pass silently"); - or Do What I Mean by swapping them if they are in the wrong order: if lower > upper: lower, upper = upper, lower I'm +1 on raising and about +0.00001 on DWIM. People who have read my posts on this mailing list in the past may remember that I am usually very suspicious of, if not hostile to, DWIM functions, but in this case I think it's harmless. This is what numpy does if you get the order wrong: py> import numpy as np py> np.clip(5, 1, 10) # This is correct. 5 py> np.clip(5, 10, 1) # WOT? 10 Silently returning garbage is not, in my opinion, acceptable here. -- Steven
This was proposed about four years ago, Here is a link to the first post in the thread: https://mail.python.org/pipermail/python-ideas/2016-July/041262.html Discussion spilled over into the following month, here's the first post following: https://mail.python.org/pipermail/python-ideas/2016-August/041276.html As I recall, there was some positive support but it ran out of steam because nobody could agree on how to handle NANs even though the IEEE-754 standard tells us how to handle them *wink* See my responses at the time re NANs here: https://mail.python.org/pipermail/python-ideas/2016-August/041439.html https://mail.python.org/pipermail/python-ideas/2016-August/041400.html https://mail.python.org/pipermail/python-ideas/2016-August/041396.html Bottom line is that passing a NAN as the lower or upper bound should treat it as equivalent to "unbounded", that is, equivalent to ±∞. The beauty of that is that it can be implemented without explicitly testing for NANs, which involves unnecessary conversions to float, and may even raise an exception. Here is the version I use: def clamp(value, lower, upper): """Clamp value to the closed interval lower...upper. The limits lower and upper can be set to None to mean -∞ and +∞ respectively. """ if not (lower is None or upper is None): if lower > upper: raise ValueError('lower must be <= to upper') if lower == upper is not None: return lower if lower is not None and value < lower: value = lower elif upper is not None and value > upper: value = upper return value Features: * uses None as a convenient alias for unbounded; * treats NANs according to the standard; * requires no explicit conversion to float or testing for NANs; * so this will work with Fractions and Decimals. By the way, using "minmax" as the name would be inappropriate as that typically has other meanings, either to return the minimum and maximum in a single call, or in the sense of minimizing the maximum value of some function or process. -- Steven
On Fri, Jul 3, 2020 at 5:07 PM Steven D'Aprano <steve@pearwood.info> wrote:
As I recall, there was some positive support but it ran out of steam because nobody could agree on how to handle NANs even though the IEEE-754 standard tells us how to handle them *wink*
See my responses at the time re NANs here:
https://mail.python.org/pipermail/python-ideas/2016-August/041439.html
https://mail.python.org/pipermail/python-ideas/2016-August/041400.html
https://mail.python.org/pipermail/python-ideas/2016-August/041396.html
Bottom line is that passing a NAN as the lower or upper bound should treat it as equivalent to "unbounded", that is, equivalent to ±∞.
That's not what the standard says. It's sorta connected to a personal opinion of Kahan's expressed in some work-in-progress lecture notes that you linked in the last message: https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF What he says there (on page 9) is
Some familiar functions have yet to be defined for NaN . For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN . There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree.
It's clear that he's not referring to standard behavior here and I'm not convinced that even he believes very strongly that min and max should behave that way. NaN means "there may be a correct answer but I don't know what it is." For example, evaluating (x**2+3*x+1)/(x+2) at x = -2 yields NaN. The correct answer to the problem that yielded this formula is probably -1, but because of the way floating point hardware works, it has no way of figuring that out. Likewise, the final result of a computation involving the square root of a negative real may be well defined, and may even be real, but the hardware can't compute it, so it "computes" NaN instead. It's definitely true that if plugging in any finite or infinite number whatsoever in place of a NaN will yield the same result, then that should be the result when you plug in a NaN. For example, clamp(x, NaN, x) should be x for every x (even NaN), and clamp(y, NaN, x) where y > x should be a ValueError (or however invalid bounds are treated). But, e.g., clamp(m, x, M) where m < x could yield any value between m and x, or a ValueError, depending on the value of M. So, if M is NaN, there is no way to know what the correct answer should be. Therefore (in my opinion) it should return NaN. There's a case for making clamp(m, x, NaN) where m >= x return m rather than NaN since there's no other *value* it could be (it could be an exception though).
Hmm. Since NaN is neither greater than nor less that anything, it seems the only correct answer to any Min,max,clamp involving a NaN is NaN. -CHB On Sat, Jul 4, 2020 at 9:15 AM Ben Rudiak-Gould <benrudiak@gmail.com> wrote:
On Fri, Jul 3, 2020 at 5:07 PM Steven D'Aprano <steve@pearwood.info> wrote:
As I recall, there was some positive support but it ran out of steam because nobody could agree on how to handle NANs even though the IEEE-754 standard tells us how to handle them *wink*
See my responses at the time re NANs here:
https://mail.python.org/pipermail/python-ideas/2016-August/041439.html
https://mail.python.org/pipermail/python-ideas/2016-August/041400.html
https://mail.python.org/pipermail/python-ideas/2016-August/041396.html
Bottom line is that passing a NAN as the lower or upper bound should treat it as equivalent to "unbounded", that is, equivalent to ±∞.
That's not what the standard says. It's sorta connected to a personal opinion of Kahan's expressed in some work-in-progress lecture notes that you linked in the last message:
https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
What he says there (on page 9) is
Some familiar functions have yet to be defined for NaN . For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN . There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree.
It's clear that he's not referring to standard behavior here and I'm not convinced that even he believes very strongly that min and max should behave that way.
NaN means "there may be a correct answer but I don't know what it is." For example, evaluating (x**2+3*x+1)/(x+2) at x = -2 yields NaN. The correct answer to the problem that yielded this formula is probably -1, but because of the way floating point hardware works, it has no way of figuring that out. Likewise, the final result of a computation involving the square root of a negative real may be well defined, and may even be real, but the hardware can't compute it, so it "computes" NaN instead.
It's definitely true that if plugging in any finite or infinite number whatsoever in place of a NaN will yield the same result, then that should be the result when you plug in a NaN. For example, clamp(x, NaN, x) should be x for every x (even NaN), and clamp(y, NaN, x) where y > x should be a ValueError (or however invalid bounds are treated).
But, e.g., clamp(m, x, M) where m < x could yield any value between m and x, or a ValueError, depending on the value of M. So, if M is NaN, there is no way to know what the correct answer should be. Therefore (in my opinion) it should return NaN.
There's a case for making clamp(m, x, NaN) where m >= x return m rather than NaN since there's no other *value* it could be (it could be an exception though). _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/PHH7EY... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Sat, 4 Jul 2020 at 19:58, Christopher Barker <pythonchb@gmail.com> wrote:
Hmm.
Since NaN is neither greater than nor less that anything, it seems the only correct answer to any Min,max,clamp involving a NaN is NaN.
Simplifying the signature, in Python we have: def min(*iterable): iterator = iter(iterable) minimum = next(iterable) for item in iterator: if item < minimum: minimum = item return minimum Due to this, min(0, float('nan')) == 0 and same for max. I would hence expect clamp to behave similarly. As a side note, the documentation actually underspecifies as in these kind of situations the iterable overall does not have a well-defined "minimum" (in a mathematical sense it simply does not have one): - how you look at the list (forward or backwards) - how you approach the comparison (item < minimum or not(minimum <= item)) change the behaviour: - max({0}, {0, 2}, {0, 1}, {1}) = {0, 2} v.s. {0, 1} when you reverse the iterable - min(0, float('nan')) = 0 v.s. float('nan') when you change the comparison Should this be clarified/specified in the docs?
On 2020-07-04 at 20:33:36 +0100, Regarding "[Python-ideas] Re: Add builtin function for min(max())," Henk-Jaap Wagenaar <wagenaarhenkjaap@gmail.com> wrote:
On Sat, 4 Jul 2020 at 19:58, Christopher Barker <pythonchb@gmail.com> wrote:
Hmm.
Since NaN is neither greater than nor less that anything, it seems the only correct answer to any Min,max,clamp involving a NaN is NaN.
Simplifying the signature, in Python we have:
def min(*iterable): iterator = iter(iterable) minimum = next(iterable) for item in iterator: if item < minimum: minimum = item return minimum
Due to this, min(0, float('nan')) == 0 and same for max. I would hence expect clamp to behave similarly.
Yuck: We also have min(float('nan'), 0) == float('nan'). I'm not sure what I'd expect a hypothetical clamp function to do. Someone with actual use cases will have more insight.
On Sat, Jul 4, 2020, at 15:57, 2QdxY4RzWzUUiLuE@potatochowder.com wrote:
Simplifying the signature, in Python we have:
def min(*iterable): iterator = iter(iterable) minimum = next(iterable) for item in iterator: if item < minimum: minimum = item return minimum
Due to this, min(0, float('nan')) == 0 and same for max. I would hence expect clamp to behave similarly.
Yuck: We also have min(float('nan'), 0) == float('nan').
I'm not sure what I'd expect a hypothetical clamp function to do. Someone with actual use cases will have more insight.
IEEE 754-2019 defines minimum and maximum functions that return NaN in all cases, and apply a strict ordering to signed zero... however, there is also a minimumNumber and maximumNumber which returns the number if the other operand is NaN [the asymmetric behavior depending on the order of the operands isn't allowed by either] it might be worthwhile to define a "min2" etc that applies the rules for one of these functions when both arguments are floats [and possibly when one argument is a float and the other is any numeric type], and then define min(*iterable) as: def min(*iterable): iterator = iter(iterable) minimum = next(iterable) for item in iterator: minimum = min2(minimum, item) return minimum I can't find anything about a clamp-like function in IEEE. It may be worth surveying what other implementations do... to start with: - Rust has a clamp function that returns NaN if the given number is NaN, and "panics" if either boundary is NaN. - Numpy's clip function effectively accepts NaN or None boundaries as "don't care" - - This appears to be implemented as min(max(x, a), b), with min and max themselves having the asymmetric behavior. - C++'s clamp function seems to be undefined if any of the operands are NaN - - T must meet the requirements of LessThanComparable in order to use overloads (1). - - However, if NaNs are avoided, T can a be floating-point type.
On 5/07/20 7:33 am, Henk-Jaap Wagenaar wrote:
min(0, float('nan')) == 0 and same for max. I would hence expect clamp to behave similarly.
But min(float('nan'), 0) == nan. I don't think you can conclude anything from either of these about how clamp() *should* behave in the presence of nans, since they're accidents of implementation. -- Greg
On Sat, Jul 04, 2020 at 11:54:47AM -0700, Christopher Barker wrote:
Hmm.
Since NaN is neither greater than nor less that anything, it seems the only correct answer to any Min,max,clamp involving a NaN is NaN.
If you want NAN-poisoning behaviour it is easy for you to add it yourself as a wrapper function, without it costing any more than it would cost to have that behaviour baked into the function. You get to choose if you want to handle floats only, or Decimals, or both, what to do with signalling NANs, or whether to bother at all. If you know your bounds are not NANs, why test for them at all? But if you bake special NAN poisoning behaviour in to the function, nobody can escape it. Everyone has to test for NANs, whether they need to or not, and worse, if they want non-poisoned behaviour, they have to test for NANs *twice*, once to over-ride the builtin behaviour, and then a second time when they call the function. When we have a choice of behaviours and no absolutely clear right or wrong choice, we should choose the behaviour that inconveniences people the least. The beauty of the implementation I give is that the behaviour with NANs follows automatically, without needing to test for them. NAN poisoning requires testing for NANs, and that is not so easy to get right: py> math.isnan(2**10000) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: int too large to convert to float py> math.isnan(Decimal('sNAN')) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: cannot convert signaling NaN to float You want everyone to pay the cost of testing for NANs, even if they don't want or need to. I say, let only those who need to test for NANs actually test for NANs. Why is this a debate we need to have? In practice, the most common use-case for clamp will be to call it multiple times against different values but with the same bounds: # Not likely to be this for value in values: lower = something_different_each_time() upper = something_different_each_time() do_something(clamp(value, lower, upper)) # Most commonly this: for value in values: do_something(clamp(value, lower, upper)) If you care about the remote possibility of the bounds being NANs, and want to return NAN instead, hoist the test outside of the call: # TODO: Need to catch OverflowError, maybe ValueError # maybe even TypeError? if math.isnan(lower) or math.isnan(upper): for value in values: do_something(NAN) else: for value in values: do_something(clamp(value, lower, upper)) and you only pay the cost once. Don't make everyone pay it over and over and over again when the bounds are known to not be NANs. -- Steven
On Sat, Jul 04, 2020 at 09:11:34AM -0700, Ben Rudiak-Gould wrote: Quoting William Kahan, one of the people who designed IEEE-754 and NANs:
What he says there (on page 9) is
Some familiar functions have yet to be defined for NaN . For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN . There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree.
It's clear that he's not referring to standard behavior here and I'm not convinced that even he believes very strongly that min and max should behave that way.
Are you suggesting that Kahan *doesn't* believe that min() and max() should be symmetric? This is what Python does now: py> max(float('nan'), 1) nan py> max(1, float('nan')) 1 That's the sort of thing Kahan is describing, and it's clear to me that he thinks that's a bad thing. I will accept that treating NANs as missing values (as opposed to NAN-poisoning behaviour that returns a NAN if one of the arguments is a NAN) is open to debate. Personally, I don't think that there aren't many, or any, good use-cases for NAN-poisoning in this function. When we had this debate four years ago, I recall there was one person who suggested a use for it, but without going into details that I can recall.
NaN means "there may be a correct answer but I don't know what it is."
That's one interpretation, but not the only one. Python makes it quite hard to get a NAN from the builtins, but other languuages do not. Here's Julia: julia> 0/0 NaN So there's at least one NAN which means *there is no correct answer*. In my younger days I was a NAN bigot who instisted that there was only one possible interpretation for NANs, but as I've gotten older I've accepted that treating them as *missing values* is acceptable. (Besides, like it or not, that's what a ton of software does.) With that interpretation, a NAN passed as the lower or upper bounds can be seen as another way of saying "no lower bounds" (i.e. negative infinity) or "no upper bounds" (i.e. positive infinity), not "some unknown bounds".
For example, evaluating (x**2+3*x+1)/(x+2) at x = -2 yields NaN.
*cough* Did you try it? In Python it raises an exception; in Julia it returns -Inf. Substituting -2 gives -1/0 which under the rules of IEEE-754 should give -Inf.
The correct answer to the problem that yielded this formula is probably -1,
How do you get that conclusion? For (x**2+3*x+1)/(x+2) to equal -1, you would have to substitute either x=-3 or x=-1, not -2. py> x = -1; (x**2+3*x+1)/(x+2) -1.0 py> x = -3; (x**2+3*x+1)/(x+2) -1.0 [...]
It's definitely true that if plugging in any finite or infinite number whatsoever in place of a NaN will yield the same result, then that should be the result when you plug in a NaN. For example, clamp(x, NaN, x) should be x for every x (even NaN), and clamp(y, NaN, x) where y > x should be a ValueError (or however invalid bounds are treated).
I think you are using the signature clamp(lower, value, upper) here. Is that right? I dislike that signature but for the sake of the argument I will use it in the following examples. I agree with you that `clamp(lower=x, value=NAN, upper= x)` should return x. I agree that we should raise if the bounds are in reverse order, e.g. clamp(lower=2, value=x, upper=1) I trust we agree that if the value is a NAN, and the bounds are not equal, we should return a NAN: clamp(1, NAN, 2) # return a NAN So I think we agree on all these cases. So I think there is only one point of contention: what to do if the bounds are NANs? There are two obvious, simple and reasonable behaviours: Option 1: Treat a NAN bounds as *missing data*, which effectively means "there is no limit", i.e. as if you had passed the infinity of the appropriate sign for the bounds. Option 2: Treat a NAN bounds as invalid, or unknown, in which case you want to return a NAN (or an exception). This is called "NAN poisoning". I will happily accept that people might reasonably want either behaviour. But unless we provide two implementations, we have to pick one or the other. Which should we pick? In the absense of any clear winner, my position is that NAN poisoning should be opt-in. We should pick the option which inconveniences people who want the other the least. Let's say the stdlib uses Option 1. The function doesn't need to do any explicit checks for NANs, so there's no problem with large integers overflowing, or Decimals raising ValueError, or any need to do a conversion to float. People who want NAN poisoning can opt-in by doing a check for NANs themselves, either in a wrapper function, or by testing the bounds *once* ahead of time and then just calling the stdlib `clamp` once they know they aren't NANs. If they use a wrapper function, they end up testing the bounds for NANs on every call, but that's what Option 2 would do so they are no worse off. So if we choose Option 1, the inconvenience to people who want Option 2 is very small. Now consider if we pick Option 2. That means that every single call to clamp checks the bounds to see if they are NANs, even though they have probably been checked a thousand times before: for x in range(10000): clamp(value=x, lower=50, upper=100) # every single time, clamp will check that # neither 50 nor 100 is a NAN. The implementation is more complex: it has to be prepared for overflow errors at the very least: >>> math.isnan(10**600) Traceback (most recent call last): File "<stdin>", line 1, in <module> OverflowError: int too large to convert to float so that's even more expense that everyone has to pay, whether they need it or not. Those who want to avoid NAN poisoning can write a wrapper function: def myclamp(value, lower, upper): try: if math.isnan(lower): lower = float("-inf") except OverflowError: lower = float("-inf") # And similar for upper return clamp(value, lower, upper) but now they are paying the cost *twice*, not avoiding it. The standard clamp still does the same NAN testing. They can't opt-out of testing for NANs, instead they end up doing the tests twice, once in their wrapper function and once in the standard function. Option 1 respects those who want to opt-out of NAN testing, and those who might choose to opt-in to it. Option 2 forces NAN testing on everyone whether they need it or not, and punishes those who try to opt-out by making them do twice as many NAN tests when they actually want to do none at all. -- Steven
On Mon, Jul 6, 2020 at 1:58 AM Steven D'Aprano <steve@pearwood.info> wrote:
Python makes it quite hard to get a NAN from the builtins, but other languuages do not. Here's Julia:
julia> 0/0 NaN
So there's at least one NAN which means *there is no correct answer*.
1e1000-1e1000 nan
ChrisA
This is a digression, but does anyone have a nice example IN PYTHON of arriving at a NaN without going through infinity. I think Julia is right and Python is wrong about '0/0', but as things are, that's not an example. On Sun, Jul 5, 2020, 12:05 PM Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Jul 6, 2020 at 1:58 AM Steven D'Aprano <steve@pearwood.info> wrote:
Python makes it quite hard to get a NAN from the builtins, but other languuages do not. Here's Julia:
julia> 0/0 NaN
So there's at least one NAN which means *there is no correct answer*.
1e1000-1e1000 nan
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TDIWUV... Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Jul 6, 2020 at 2:15 AM David Mertz <mertz@gnosis.cx> wrote:
This is a digression, but does anyone have a nice example IN PYTHON of arriving at a NaN without going through infinity. I think Julia is right and Python is wrong about '0/0', but as things are, that's not an example.
Not sure why "without going through infinity" is relevant, but you can always just use float("nan") to get one, and I'm sure there are other calculations that result in nan. It's just that 0/0 (like any other operation that involves division by zero, including 0**-1) immediately raises, rather than silently returning a nan. ChrisA
On Sun, Jul 5, 2020, 3:51 PM Chris Angelico <rosuav@gmail.com> wrote:
On Mon, Jul 6, 2020 at 2:15 AM David Mertz <mertz@gnosis.cx> wrote:
This is a digression, but does anyone have a nice example IN PYTHON of
arriving at a NaN without going through infinity. I think Julia is right and Python is wrong about '0/0', but as things are, that's not an example.
Not sure why "without going through infinity" is relevant, but you can always just use float("nan") to get one, and I'm sure there are other calculations that result in nan. It's just that 0/0 (like any other operation that involves division by zero, including 0**-1) immediately raises, rather than silently returning a nan.
Like I said, digression. I teach ieee-754 pretty often, or at least touch on it. I want to demonstrate to students that they might have NaN values to consider. Constructing one purely artificially with 'float("nan")' doesn't make the point well. Some operation that ends up with overflow infinities that are divided or subtracted is OK. But it would be nice to show a clean example where NANs arise without infinities arising first.
On Sun, Jul 05, 2020 at 12:15:27PM -0400, David Mertz wrote:
This is a digression, but does anyone have a nice example IN PYTHON of arriving at a NaN without going through infinity. I think Julia is right and Python is wrong about '0/0', but as things are, that's not an example.
I wouldn't expect one in Python, I think there is an unofficial policy of ensuring that Python builtins and the math library will not return NANs unless passed a NAN, or at least an INF, rather they will raise. Chris' example:
1e1000-1e1000 nan
is just a funny way of writing INF - INF :-) -- Steven
On Sun, Jul 5, 2020 at 8:57 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Jul 05, 2020 at 12:15:27PM -0400, David Mertz wrote:
This is a digression, but does anyone have a nice example IN PYTHON of arriving at a NaN without going through infinity. I think Julia is right and Python is wrong about '0/0', but as things are, that's not an example.
I wouldn't expect one in Python, I think there is an unofficial policy of ensuring that Python builtins and the math library will not return NANs unless passed a NAN, or at least an INF, rather they will raise.
1e1000-1e1000 nan is just a funny way of writing INF - INF :-)
The standard library *does* seem to have taken pains to avoid "finite nans." It kinda weakens your case about worrying about doing clamp() right in the face of NaNs :-). I recognize there are funny ways of writing infinity. But since Python really doesn't quite follow IEEE-754 on 0/0, or math.fmod(x, 0.), or a few other places it might arise in "natural" operations (i.e. it's easy not to notice that your 'y' has become zero. It also looks like the trig functions are pruned to those that don't have undefined values for numbers I can type in. I can *type* `math.tan(math.pi/2)`, of course. But math.pi is a little bit smaller than the actual pi, so I just get a big number for an answer. But I cannot try the hypothetical:
math.cot(0) nan
For what we actually have:
math.tan(math.pi/2)
1.633123935319537e+16 One ULP more:
math.tan(np.nextafter(math.pi/2, np.inf))
-6218431163823738.0 -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On Sun, Jul 05, 2020 at 09:42:07PM -0400, David Mertz wrote:
The standard library *does* seem to have taken pains to avoid "finite nans."
I don't know what you mean by "finite nans". By definition, any NAN is not considered finite. py> math.isfinite(math.nan) False Do you mean, the stdlib has taken pains to avoid returning NANs ex novo, i.e. from purely finite arguments? Then I agree.
It kinda weakens your case about worrying about doing clamp() right in the face of NaNs :-).
Are you suggesting that we should do clamp() wrong instead? *wink* What I intended is that the stdlib tends to raise rather than return a NAN from some calculation not already including NANs or INFs. But if you pass NANs into the calculation, then the stdlib honours them. py> math.atan(math.nan) nan So if you pass a NAN to clamp(), it should do the right thing, which may be returning a NAN: clamp(NAN, -1, 1) # Should certainly return a NAN. or may not: clamp(NAN, 1, 1) # Should certainly return 1. Only the behaviour when one or the other of the bounds are NANs is controversial. I acknowledge that there are two good behaviours, and it is reasonable for people to want one or the other. I have argued why one is better and less inconvenient than the other, but I won't rehash that argument here. min() and max() are notable, and unfortunate, exceptions in that their treatment of NANs depends on the order of argument. I would call that a bug except that the statistics module (which I wrote) has the same flaw, and I've argued in the past that this is not a bug :-) But in both cases, statistics and min/max, it is clear that the order-dependent behaviour satisfies nobody and is undesirable.
It also looks like the trig functions are pruned to those that don't have undefined values for numbers I can type in. I can *type* `math.tan(math.pi/2)`, of course. But math.pi is a little bit smaller than the actual pi, so I just get a big number for an answer.
That's not the trig functions' fault, it's the fact that we cannot exactly represent pi/2 exactly. I'm not sure what you mean by pruning them, it is pretty much standard that tan(pi/2) doesn't fail: https://stackoverflow.com/questions/20287765/math-tan-near-pi-2-wrong-in-net... and that comes out of the floating point representation, not tan.
But I cannot try the hypothetical:
math.cot(0) nan
No, but you could try this instead: py> math.log(0.0) Traceback (most recent call last): File "<stdin>", line 1, in <module> ValueError: math domain error -- Steven
On Sun, Jul 5, 2020, 10:25 PM Steven D'Aprano <steve@pearwood.info> wrote:
The standard library *does* seem to have taken pains to avoid "finite nans."
I don't know what you mean by "finite nans". By definition, any NAN is not considered finite.
The scare quotes because I know that's not a real thing. Maybe "genetically finite" ... I.e. do some operations on regular finite values that wind up with NaN. I know, I know... That's not really right either, since finite values can overflow to infinities on their own. Do you mean, the stdlib has taken pains to avoid returning NANs ex novo,
i.e. from purely finite arguments? Then I agree.
Yes. That's it. :-)
min() and max() are notable, and unfortunate, exceptions in that their treatment of NANs depends on the order of argument. I would call that a bug except that the statistics module (which I wrote) has the same flaw, and I've argued in the past that this is not a bug :-)
Personal growth is healthy! That's not the trig functions' fault, it's the fact that we cannot
exactly represent pi/2 exactly. I'm not sure what you mean by pruning them, it is pretty much standard that tan(pi/2) doesn't fail:
Of course. math.pi is a number that isn't really pi, as any finite representation must be. tan() is doing the right thing. I just meant that if math.cot() were added to the standard library, I could pass an exact zero as argument. None of the trig or hyperbolic functions that are undefined at ZERO are included. But yes, math.log(0) is another one where a NaN is avoided in favor of an exception. A few different log bases, but same story for all.
On Sun, Jul 5, 2020 at 8:57 AM Steven D'Aprano <steve@pearwood.info> wrote:
In the absense of any clear winner, my position is that NAN poisoning should be opt-in. We should pick the option which inconveniences people who want the other the least .
Let's say the stdlib uses Option 1. The function doesn't need to do any explicit checks for NANs, so there's no problem with large integers overflowing, or Decimals raising ValueError, or any need to do a conversion to float.
People who want NAN poisoning can opt-in by doing a check for NANs themselves, either in a wrapper function, or by testing the bounds *once* ahead of time and then just calling the stdlib `clamp` once they know they aren't NANs.
Imagine making the same statement about exceptions: Exceptions being raised should be opt-in. That sounds crazy but it's not. Before exceptions were commonplace, there were three possibilities when unreasonable operations were performed: - return some arbitrary or random value (not a NaN because they hadn't been invented); - and also set an "error flag" that could be checked by the caller to determine if something had gone wrong (but frequently was not); - terminate the program. Things are much better now. Few argue that it was better before. Think of NaN as the value equivalent of an exception. NaN poisoning is the equivalent of the fact that any function that doesn't catch an exception passes it through. I don't usually write code that *uses *NaNs directly. If I want a distinguished non-numeric value, I use None or some other sentinel. If a NaN is produced by my code it indicates a bug. NaN poisoning increases the chance that a NaN generated somewhere won't be hidden by later code that manipulates that value. Why would I want to suppress that? Just as an exception can be suppressed by explicit code, Nan poisoning can be suppressed by explicit checks. So let me rewrite your second and third paragraphs above (recall Option 1 which you favor is ignore NaNs, and Option 2 is NaN poisoning): Let's say the stdlib uses Option 2. The function doesn't need to do any
explicit checks for NANs, so there's no problem with large integers overflowing, or Decimals raising ValueError, causing errors that don't get noticed,
or any need to do a conversion to float.
People who don't want NAN poisoning can opt-out by doing a check for NANs themselves, either in a wrapper function, or by testing the bounds *once* ahead of time and then not calling the stdlib `clamp` once they know they are NANs.
This is a better argument. People that use NaNs and expect them write code to handle it. The rest of us don't want to be surprised by suppressed errors. --- Bruce
On Sun, Jul 05, 2020 at 11:58:58AM -0700, Bruce Leban wrote:
People who want NAN poisoning can opt-in by doing a check for NANs themselves, either in a wrapper function, or by testing the bounds *once* ahead of time and then just calling the stdlib `clamp` once they know they aren't NANs.
Imagine making the same statement about exceptions:
Exceptions being raised should be opt-in.
That sounds crazy but it's not. Before exceptions were commonplace, there were three possibilities when unreasonable operations were performed:
But using a NAN is not an unreasonable operation. There is a perfectly sensible interpretaion available for using NANs as bounds, and it is one which is supported by the IEEE-754 recommended treatment of minimum and maximum: missing values. That same behaviour falls out naturally from a very simple, non- contrived and efficient implementation of clamp that relies only on the value supporting less than. (The bounds also have to support equality.) It doesn't even have to be numeric! So long as the value supports less than, you can clamp it. Whether that is meaningful or not depends on the value, but the feature is there for those who can make use of it. Duck-typing for the win!
- return some arbitrary or random value (not a NaN because they hadn't been invented);
IEEE-754 goes back to the 1980s, before exceptions were commonplace. So returning a NAN was certainly an option. On the Apple Mac, converting a string to a float would return a NAN if it was a non-numeric value like "abc". So did operations like arcsin(2).
- and also set an "error flag" that could be checked by the caller to determine if something had gone wrong (but frequently was not); - terminate the program.
Things are much better now. Few argue that it was better before.
There are many people who consider exceptions to be a terrible mistake, including the creators of Go. So your analogy is not as strong as you think.
Think of NaN as the value equivalent of an exception. NaN poisoning is the equivalent of the fact that any function that doesn't catch an exception passes it through.
Right. That's an excellent analogy. If exceptions were uncatchable, they would always be fatal and they would be just like the bad old days where any error would always terminate the program. But they aren't uncatchable, and there are situations where NANs don't represent a fatal error that can only propagate through your calculation poisoning the results. Here are some examples: py> 1 < NAN False py> min(max(1, NAN), NAN) 1 py> 1**NAN 1.0 The first two are especially relevant for clamp.
I don't usually write code that *uses *NaNs directly. If I want a distinguished non-numeric value, I use None or some other sentinel. If a NaN is produced by my code it indicates a bug. NaN poisoning increases the chance that a NaN generated somewhere won't be hidden by later code that manipulates that value. Why would I want to suppress that?
Then don't. I'm happy for you to test for it. Write a wrapper function that checks the bounds, and you are no worse off than if it was builtin. In many use-cases, you won't even need a wrapper function, because the bounds won't be changing, or will change only rarely. So you only need test them once, not on every call to clamp. Win! I respect your right to check the bounds for NANs. How about you respect my right to not to, and don't force me to do it *twice* to get the behaviour I want?
Just as an exception can be suppressed by explicit code, Nan poisoning can be suppressed by explicit checks.
What you want is for people who want to test for NANs to have the clamp function do it for them, and people who don't want to check for NANs to have to do it twice, once in their wrapper function, and once in the clamp function. Your attitude here is literally: "Oh, you don't want to check for NANs? Then I'll make you do it twice as much as everyone else!"
So let me rewrite your second and third paragraphs above (recall Option 1 which you favor is ignore NaNs, and Option 2 is NaN poisoning):
Let's say the stdlib uses Option 2. The function doesn't need to do any explicit checks for NANs,
But that's not true, it does need to check for NANs explicitly, because order comparisons `x < NAN` don't return NANs. There may be some clever arrangement of arguments for min and max that will return a NAN, but that's depending on accidental behaviour. You can't rely on it. (Having the result of min and max depend on the order of arguments is not a feature, and relying on that accidental behaviour is not safe.) -- Steven
On Sun, Jul 5, 2020 at 5:49 PM Steven D'Aprano <steve@pearwood.info> wrote:
On Sun, Jul 05, 2020 at 11:58:58AM -0700, Bruce Leban wrote:
But using a NAN is not an unreasonable operation.
I didn't say that it was. A NaN is the *result* of an operation that cannot produce a number. I didn't intend the word "unreasonable" to mean anything more than that, just as you don't have to be crazy to use irrational numbers.
There is a perfectly sensible interpretaion available for using NANs as bounds, and it is one which is supported by the IEEE-754 recommended treatment of minimum and maximum: missing values.
Supported, yes. Recommended, no. IEEE-754 specifies that the *minimum *operation propagates NaNs while the *alternate **minimumNumber *operation drops NaNs. [image: image.png]
That same behaviour falls out naturally from a very simple, non- contrived and efficient implementation of clamp that relies only on the value supporting less than.
Convenience of implementation is not (to me) a compelling argument for what the semantics of an operation are.
Duck-typing for the win!
Always agree with this.
Think of
NaN as the value equivalent of an exception. NaN poisoning is the
equivalent of the fact that any function that doesn't catch an exception passes it through.
Right. That's an excellent analogy. If exceptions were uncatchable, they would always be fatal and they would be just like the bad old days where any error would always terminate the program.
But they aren't uncatchable, and there are situations where NANs don't represent a fatal error that can only propagate through your calculation poisoning the results.
Here are some examples:
py> 1 < NAN False
py> min(max(1, NAN), NAN) 1
This is unfortunate and does not follow the IEEE-754 guidelines as noted above.
I want a distinguished non-numeric value, I use None or some other sentinel. If a NaN is produced by my code it indicates a bug. NaN
increases the chance that a NaN generated somewhere won't be hidden by later code that manipulates that value. Why would I want to suppress
I don't usually write code that *uses *NaNs directly. If poisoning that?
Then don't. I'm happy for you to test for it. Write a wrapper function that checks the bounds, and you are no worse off than if it was builtin.
What you're suggesting is that everyone who doesn't normally use NaNs should be forced to think about it and test for it, just in case, because you're going to hide the errors. I would suspect there is much more code out there that does not check for NaNs then does. And most people who use the clamp function (or min and max today) won't check for it. What I'm saying is that those who care about NaNs should be the ones to test for it. I respect your right to check the bounds for NANs. How about you respect
my right to not to, and don't force me to do it *twice* to get the behaviour I want?
This has nothing to do with rights. That's a fallacious argument. Defining a stdlib function one way or another does not damage your rights. And I'm not forcing you to check twice. What happens inside the clamp function is not code you are writing. If this is a performance argument, then many would say you shouldn't be writing Python. There may be some clever arrangement of arguments for min and max that
will return a NAN, but that's depending on accidental behaviour. You can't rely on it.
(Having the result of min and max depend on the order of arguments is not a feature, and relying on that accidental behaviour is not safe.)
I agree with this. It is unfortunate that min and max have this behavior which I would consider a bug. I don't want clamp to have buggy behavior either. --- Bruce
On 7/5/20 11:07 PM, Bruce Leban wrote:
On Sun, Jul 5, 2020 at 5:49 PM Steven D'Aprano <steve@pearwood.info <mailto:steve@pearwood.info>> wrote:
On Sun, Jul 05, 2020 at 11:58:58AM -0700, Bruce Leban wrote:
But using a NAN is not an unreasonable operation.
I didn't say that it was. A NaN is the /result/ of an operation that cannot produce a number. I didn't intend the word "unreasonable" to mean anything more than that, just as you don't have to be crazy to use irrational numbers.
There is a perfectly sensible interpretaion available for using NANs as bounds, and it is one which is supported by the IEEE-754 recommended treatment of minimum and maximum: missing values.
Supported, yes. Recommended, no. IEEE-754 specifies that the *minimum *operation propagates NaNs while the /alternate /*minimumNumber *operation drops NaNs.
image.png
It should be noted, this is a change in the Standard that is just 1 year old. The previous version did define minimum as similar to the minimumNumber version. The change was made because it handled sNans differently (it defined minimum(sNan, x) to be qNaN) and that definition turned out to be non-associative (it mattered whether the sNan was the last number or not) so they needed to change it. In doing the change, the also apparently decided that the 'Nan-poisoning' version was to be preferred and the Nan ignoring to be just an alternate with the change of behavior from the previous version with sNan.
--- Bruce
-- Richard Damon
On Mon, Jul 6, 2020 at 4:27 AM Richard Damon <Richard@damon-family.org> wrote:
Supported, yes. Recommended, no. IEEE-754 specifies that the *minimum *operation propagates NaNs while the /alternate /*minimumNumber *operation drops NaNs.
It should be noted, this is a change in the Standard that is just 1 year old. The previous version did define minimum as similar to the minimumNumber version.
Thanks for this info. I really dislike "standards" being hidden behind paywalls. I believe that contributes to people disregarding them. And it's sloppy to not highlight changes in a new version of a standard. --- Bruce
While I agree that having "clamp" operate on all Python types is a "good thing"[*], there is a complication. If it were put in the math module, then it would pretty much have to be a floating point function. At least that was the case for math.isclose(). My first prototype of isclose() was written in Python, and carefully designed to work with any type (well, any numeric type -- i.e. Decimal and Fraction. But once it was decided to add it to the math module, that got removed, for (I think) two reasons. 1) The math module is written in C, and Guido at least (he was still BDFL then) rejected the idea of refactoring it to allow Python components (yes, that was proposed, and even started by, I think, Victor Stiner). Yes, you can write generic functions in C, but it's a lot more of a pain. 2) The rest of the math module is currently all about floats already, having begun as a wrapper around the C math library. So that continues. (there was a version added to cmath as well). If it doesn't live in the math module, then all this is irrelevant. -CHB [*] Honestly, while you *could* use clamp() with any comparable objects, I really don't see a use case for anything non-numerical, but why not? On Mon, Jul 6, 2020 at 9:40 AM Bruce Leban <bruce@leban.us> wrote:
On Mon, Jul 6, 2020 at 4:27 AM Richard Damon <Richard@damon-family.org> wrote:
Supported, yes. Recommended, no. IEEE-754 specifies that the *minimum *operation propagates NaNs while the /alternate /*minimumNumber *operation drops NaNs.
It should be noted, this is a change in the Standard that is just 1 year old. The previous version did define minimum as similar to the minimumNumber version.
Thanks for this info. I really dislike "standards" being hidden behind paywalls. I believe that contributes to people disregarding them. And it's sloppy to not highlight changes in a new version of a standard.
--- Bruce _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/TOJMIC... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Tue, Jul 7, 2020 at 3:49 AM Christopher Barker <pythonchb@gmail.com> wrote:
[*] Honestly, while you *could* use clamp() with any comparable objects, I really don't see a use case for anything non-numerical, but why not?
Depends how broadly you define "numerical". Is a datetime numerical? It's a reasonable thing to clamp, and in a sense, can be thought of as a number (eg Unix time). But it's certainly not part of the numeric tower. I'd say it's best to let clamp work with anything comparable, and let people figure for themselves what makes sense. ChrisA
On Mon, Jul 6, 2020 at 11:28 AM Chris Angelico <rosuav@gmail.com> wrote:
On Tue, Jul 7, 2020 at 3:49 AM Christopher Barker <pythonchb@gmail.com> wrote:
[*] Honestly, while you *could* use clamp() with any comparable objects,
I really don't see a use case for anything non-numerical, but why not?
Depends how broadly you define "numerical". Is a datetime numerical?
Good catch -- clearly a failure of my imagination :-) I think that's a great example of why it's useful for the function to be generic -- better example than strings, for sure :-) -CHB It's a reasonable thing to clamp, and in a sense, can be thought of as
a number (eg Unix time). But it's certainly not part of the numeric tower. I'd say it's best to let clamp work with anything comparable, and let people figure for themselves what makes sense.
ChrisA _______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/IOIL2K... Code of Conduct: http://python.org/psf/codeofconduct/
-- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Mon, Jul 06, 2020 at 10:46:03AM -0700, Christopher Barker wrote:
1) The math module is written in C, and Guido at least (he was still BDFL then) rejected the idea of refactoring it to allow Python components (yes, that was proposed, and even started by, I think, Victor Stiner). Yes, you can write generic functions in C, but it's a lot more of a pain.
I think that it is long past time that we give up the idea that the math module is a thin wrapper around the system's C maths library. We should just add a math.py file that looks like this: from _math import *
2) The rest of the math module is currently all about floats already,
That's not really the case, and hasn't been for a while. Some functions manage to work very well with non-floats: py> math.factorial(100) 9332621544394415268169923885626670049071596826438162146859296389 5217599993229915608941463976156518286253697920827223758251185210 916864000000000000000000000000 # New in Python 3.8 I think? py> math.prod([1, 2, 3, Fraction(1, 5)]) Fraction(6, 5) Even if they return a float, or int, they still work with non-floats without losing accuracy: py> math.log10(10**5000) 5000.0 py> math.ceil(Fraction(1, 10**5000)) 1 even when the argument would overflow or underflow. -- Steven
Off topic: I saw this NaN article this morning. Title: Hungry? Please enjoy this delicious NaN, courtesy of British Gas and Sainsbury's URL: https://www.theregister.com/2020/07/09/bork/ I particularly liked the Tideford Butterscotch Rice Pudding, at NaNp per 100g. -- Jonathan
On 9/07/20 10:25 pm, Jonathan Fine wrote:
I particularly liked the Tideford Butterscotch Rice Pudding, at NaNp per 100g.
I'd be a bit worried that if I bought one of those, I'd end up with a balance of NaN in my bank account. I hate to think what sort of havoc that would cause... -- Greg
Yeah, but "not a number of pounds" sure beats seeing "£inf". On Thu, Jul 9, 2020 at 3:25 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Off topic: I saw this NaN article this morning.
Title: Hungry? Please enjoy this delicious NaN, courtesy of British Gas and Sainsbury's URL: https://www.theregister.com/2020/07/09/bork/
I particularly liked the Tideford Butterscotch Rice Pudding, at NaNp per 100g.
-- Jonathan
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/56CVM7... Code of Conduct: http://python.org/psf/codeofconduct/
Hmm, if my bank account balance is NaN, then: if not (withrdrawal_amount > balance): give_cash_to_customer(withdrawal_amount) would be pretty nice :-) -CHB On Thu, Jul 9, 2020 at 7:47 AM Eric Fahlgren <ericfahlgren@gmail.com> wrote:
Yeah, but "not a number of pounds" sure beats seeing "£inf".
On Thu, Jul 9, 2020 at 3:25 AM Jonathan Fine <jfine2358@gmail.com> wrote:
Off topic: I saw this NaN article this morning.
Title: Hungry? Please enjoy this delicious NaN, courtesy of British Gas and Sainsbury's URL: https://www.theregister.com/2020/07/09/bork/
I particularly liked the Tideford Butterscotch Rice Pudding, at NaNp per 100g.
-- Jonathan
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/56CVM7... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-ideas mailing list -- python-ideas@python.org To unsubscribe send an email to python-ideas-leave@python.org https://mail.python.org/mailman3/lists/python-ideas.python.org/ Message archived at https://mail.python.org/archives/list/python-ideas@python.org/message/2LBDLA... Code of Conduct: http://python.org/psf/codeofconduct/
On 10/07/20 8:05 am, Christopher Barker wrote:
if not (withrdrawal_amount > balance): give_cash_to_customer(withdrawal_amount)
Unfortunately, with my luck they will have coded it as if withdrawal_amount <= balance: give_cash_to_customer(withdrawal_amount) But my bigger concern is that the NaN in my bank account will spread to other people's accounts, and to other banks, and eventually lead to collapse of the entire worldwide financial system. (Or maybe this has already happened? We were *told* the global financial crash was caused by dodgy mortgages, but...) -- Greg
On 6/07/20 3:55 am, Steven D'Aprano wrote:
With that interpretation, a NAN passed as the lower or upper bounds can be seen as another way of saying "no lower bounds" (i.e. negative infinity) or "no upper bounds" (i.e. positive infinity), not "some unknown bounds".
Python already has a value for representing missing or unspecified data, i.e. None. So we don't need to use NaN for that, and can instead reserve it to mean "no correct answer".
I agree with you that `clamp(lower=x, value=NAN, upper= x)` should return x.
I don't think I agree with that, because it relies on assuming that the lower and upper bounds can meaningfully be compared for exact equality, which may not be true depending on the circumstances.
Treat a NAN bounds as *missing data*, which effectively means "there is no limit", i.e. as if you had passed the infinity of the appropriate sign for the bounds.
If one of the bounds is missing, you don't need clamp(), you can use min() or max(). -- Greg
On Mon, Jul 06, 2020 at 12:59:28PM +1200, Greg Ewing wrote:
I agree with you that `clamp(lower=x, value=NAN, upper= x)` should return x.
I don't think I agree with that
Sorry Greg, on this point at least the IEEE-754 standard is firm: if a function will return the same result for every non-NAN argument, then it must return the same result for NAN arguments too. clamp(value, x, x) will always return x for every finite and infinite value, so it must return x for NANs too. Quoting one of the standard committee members: NaN must not be confused with “Undefined.” On the contrary, IEEE 754 defines NaN perfectly well even though most language standards ignore and many compilers deviate from that definition. The deviations usually afflict relational expressions, discussed below. Arithmetic operations upon NaNs other than SNaNs (see below) never signal INVALID, and always produce NaN unless replacing every NaN operand by any finite or infinite real values would produce the same finite or infinite floating-point result independent of the replacements. https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF See page 7. This is why 1.0**NAN returns 1.0, and why math.hypot(INF, NAN) returns INF.
because it relies on assuming that the lower and upper bounds can meaningfully be compared for exact equality, which may not be true depending on the circumstances.
I don't understand that objection. Can you give a concrete example?
Treat a NAN bounds as *missing data*, which effectively means "there is no limit", i.e. as if you had passed the infinity of the appropriate sign for the bounds.
If one of the bounds is missing, you don't need clamp(), you can use min() or max().
Only if you know it is missing. If the bounds come from some other calculation or from the user, how do you know they are missing? if lower is upper is None: pass elif lower is None: value = min(value, upper) elif upper is None: value = max(value, lower) else: value = clamp(value, lower, upper) We don't have three slice functions to cover the cases where one or the other bounds is missing: slice(start, stop) slice_end_at(stop) slice_start_at(start) we just have a single function that takes a missing value. -- Steven
On Sun, Jul 5, 2020 at 9:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
I agree with you that `clamp(lower=x, value=NAN, upper= x)` should return x.
Sorry Greg, on this point at least the IEEE-754 standard is firm: if a function will return the same result for every non-NAN argument, then it must return the same result for NAN arguments too.
clamp(value, x, x)
will always return x for every finite and infinite value, so it must return x for NANs too.
I strongly agree with Steven here. Also about order-dependence in results of min() and max() being disturbing and contrary to IEEE-754. ... so, umm... Steven... statistics.median()?! Btw, definitely +1 on math.clamp(value, *, lower=None, upper=None) . -1 on built-in. -0 on any other function signature. Actually, I'm fine with math.clip() as well, but clamp seems more popular.
On Sun, Jul 5, 2020 at 6:51 PM David Mertz <mertz@gnosis.cx> wrote:
will always return x for every finite and infinite value, so it must
return x for NANs too.
I strongly agree with Steven here. Also about order-dependence in results of min() and max() being disturbing and contrary to IEEE-754.
... so, umm... Steven... statistics.median()?!
Since you brought that up -- I recall a lot of debate about whether NaN's should be considered missing values or "poisoning" in the statistics module -- there are arguments for both, and neither was clear or obvious. So using NaN to mean "not specified" in this context would not be obvious to everyone, and when we have the perfectly good None instead, why do it? Btw, definitely +1 on math.clamp(value, *, lower=None, upper=None) .
what about: math.clamp(value, *, lower=-math.inf, upper=math.inf) . Also it seems we can have "NaN poisoning" behaviour without explicitly checking for it. The real key is not that it's a NaN, but that it doesn't compare as True to anything. Someone smarter than me could probably make this cleaner, but this works: In [112]: def min(x, y): ...: if x < y: ...: return x ...: elif y <= x: ...: return y ...: else: ...: return x if not (x < math.inf) else y Note that I did that ugly ternary expression at the end in hopes that it would work with non-float NaN-like objects, but no luck there, at least not with Decimal: In [113]: min(1, Decimal('NaN')) --------------------------------------------------------------------------- InvalidOperation Traceback (most recent call last) <ipython-input-113-3bedbbbf79f2> in <module> ----> 1 min(1, Decimal('NaN')) <ipython-input-112-77103db78666> in min(x, y) 1 def min(x, y): ----> 2 if x < y: 3 return x 4 elif y <= x: 5 return y InvalidOperation: [<class 'decimal.InvalidOperation'> It seems that Decimal Nan does not behave like FP NaNs :-( -1 on built-in. -0 on any other function signature. Actually, I'm fine with math.clip() as well, but clamp seems more popular. One thing about putting it in the math module is that we could then make assumptions about the float type. And we'd have to write it in C, so it would be fast, even if there is some ugly NaN-checking behavior and the like. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
On Sun, Jul 5, 2020 at 10:04 PM Christopher Barker <pythonchb@gmail.com> wrote:
Since you brought that up -- I recall a lot of debate about whether NaN's should be considered missing values or "poisoning" in the statistics module -- there are arguments for both, and neither was clear or obvious. So using NaN to mean "not specified" in this context would not be obvious to everyone, and when we have the perfectly good None instead, why do it?
Well, yes... I wrote a lot of that debate :-) I even sort of re-discovered quick select on my own... then eventually figured out that a bunch of people had benchmarked a better implementation to potentially use in statistics.median() a year before I tried. Python sorted() is really fast! But it's still the WRONG way to do this, or at least there should be a switch to allow nan-poisoning and/or nan-stripping. Btw, definitely +1 on math.clamp(value, *, lower=None, upper=None) .
what about:
math.clamp(value, *, lower=-math.inf, upper=math.inf) .
Oh sure. That's fine. But the implementation would still need to check for None and convert it to the infinities. Ordinary users just simply ARE going to try: math.clamp(x, lower=None, upper=99) And expect that to mean "I don't care about lower bound." -- The dead increasingly dominate and strangle both the living and the not-yet born. Vampiric capital and undead corporate persons abuse the lives and control the thoughts of homo faber. Ideas, once born, become abortifacients against new conceptions.
On Sun, Jul 5, 2020 at 6:01 PM Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
On 6/07/20 3:55 am, Steven D'Aprano wrote:
With that interpretation, a NAN passed as the lower or upper bounds can be seen as another way of saying "no lower bounds" (i.e. negative infinity) or "no upper bounds" (i.e. positive infinity), not "some unknown bounds".
Python already has a value for representing missing or unspecified data, i.e. None. So we don't need to use NaN for that, and can instead reserve it to mean "no correct answer".
+1 and we can use +inf and -inf for unlimited bounds as well. Yes, they are a bit of a pain to write in Python, but we could do: def clamp(value, min=-math.inf, max=math.inf): ... yes, that would make them optional, rather than required, and therefore provide a slight asymmetry between specifying min only or max only, but still be handy. to make it more consistent, but maybe more annoying in the common case, we could make them keyword only.
I agree with you that `clamp(lower=x, value=NAN, upper= x)` should
return x.
I don't think I agree with that, because it relies on assuming that the lower and upper bounds can meaningfully be compared for exact equality, which may not be true depending on the circumstances.
and then we'd need to check if they were equal as well.
Treat a NAN bounds as *missing data*, which effectively means "there is no limit", i.e. as if you had passed the infinity of the appropriate sign for the bounds.
and really how often would one end up with NaN as a bound anyway? Often they will be hard-coded. I"m having a really hard time imagining when you'd end up with NaN for a bound that was NOT an error! It would be far more likely for the value you want clamped to be NaN -- and then it sure as heck should return NaN. As for the behavior of min() and max() when provided a NaN (and much of Python's handling of FP special values) -- I think that's a practicality-beats-purity issue. I have a really hard time thinking that anyone thinks that: In [81]: min(1, math.nan) Out[81]: 1 In [82]: min(math.nan, 1) Out[82]: nan is ideal behavior! while I was writing this: On Sun, Jul 5, 2020 at 6:38 PM Steven D'Aprano <steve@pearwood.info> wrote:
... on this point at least the IEEE-754 standard is firm: if a function will return the same result for every non-NAN argument, then it must return the same result for NAN arguments too.
clamp(value, x, x)
will always return x for every finite and infinite value, so it must return x for NANs too.
Except that Python (maybe except for the math module) does not conform to IEEE-754 in many other places. So we do have a practicality beats purity choice here. -CHB -- Christopher Barker, PhD Python Language Consulting - Teaching - Scientific Software Development - Desktop GUI and Web Development - wxPython, numpy, scipy, Cython
I do not agree clamp should be restricted to numeric values. I would expect clamp to be agnostic to the specifics of floats/numbers and like sort expect it to work for any values as long as they compare (using a dunder). I think having something like min=-math.inf is hence right out in my mind. If I got this right, the implementation could be as simple as: def clamp(value, *, min=None, max=None): if min is not None and value < min: return min if max is not None and max < value: return max return value I think the crucial question here is: does the order of the ifs matter and is that an issue. The only time (barring side effects, values changing in between calls et cetera) it would make a difference if max < value < min. Assuming transitivity (can anybody come up with an example of a non-transitive order where clamping makes sense?) this means max < min and so you can work around this by disallowing it: def clamp_safe(value, * min=None, max=None): if max < min: raise SomeException("Something about min < max") return clamp(value, min=min, max=max) What I like about both of these is that they only use "<", just like sort. Going back to nans, I think that would mean: clamp(nan, min, max) = nan clamp(value, nan, max) = clamp(value, None, max) = max(value, max) clamp(value, min, nan) = clamp(value, min, None) = min(value, min) On Mon, 6 Jul 2020 at 02:55, Christopher Barker <pythonchb@gmail.com> wrote:
and we can use +inf and -inf for unlimited bounds as well. Yes, they are a bit of a pain to write in Python, but we could do:
def clamp(value, min=-math.inf, max=math.inf): ...
yes, that would make them optional, rather than required, and therefore provide a slight asymmetry between specifying min only or max only, but still be handy. to make it more consistent, but maybe more annoying in the common case, we could make them keyword only.
On Mon, Jul 06, 2020 at 03:21:04AM +0100, Henk-Jaap Wagenaar wrote:
I do not agree clamp should be restricted to numeric values. I would expect clamp to be agnostic to the specifics of floats/numbers and like sort expect it to work for any values as long as they compare (using a dunder).
It is possible to write clamp so that it relies only on two things: - the bounds must support equality and less than; - the value must support less than. That is pretty much as general as it gets. I think that it is okay to document clamp as *intended* for numbers but allow it to be used for non-numbers, similar to the builtin sum(). (Although sum needs to be tricked into supporting strings.) Four years ago, there was strong opposition to giving the bounds default values. I think the consensus at the time was that it is okay to explicitly provide "unbounded" arguments (whether you spell them as infinities, NANs, or None) but you should have to explicitly do so: clamp(x) just reads poorly. I concur with that argument -- or at least, I don't disagree strongly enough to argue against it. This is why the implementation I posted earlier accepts None as bounds, but doesn't give them defaults. -- Steven
On Mon, Jul 6, 2020 at 12:36 PM Steven D'Aprano <steve@pearwood.info> wrote:
Four years ago, there was strong opposition to giving the bounds default values. I think the consensus at the time was that it is okay to explicitly provide "unbounded" arguments (whether you spell them as infinities, NANs, or None) but you should have to explicitly do so:
clamp(x)
just reads poorly.
Yes, but it's also useless (clamping without ANY bounds?). In terms of reading poorly, this is far worse: clamp(x, 10) Does that ensure that it's no *greater* than 10 or no *less* than 10? Since the args would be min before max, I would expect that this has a lower bound and no upper bound, but there'll be people equally confident that it should behave like range() and have an upper bound with no lower bound (which would probably be more useful in a lot of situations anyway). So I also agree that the bounds should be given explicitly. ChrisA
On 06/07/2020 01:59, Greg Ewing wrote:
If one of the bounds is missing, you don't need clamp(), you can use min() or max().
Not true. It would be true if you knew a specific one of the bounds was always missing, but you might want to call it repeatedly, sometimes with either bound (or both) missing, sometimes not. Rob Cliffe
On 07/03/2020 05:03 PM, Steven D'Aprano wrote:
def clamp(value, lower, upper): """Clamp value to the closed interval lower...upper.
The limits lower and upper can be set to None to mean -∞ and +∞ respectively. """ if not (lower is None or upper is None): if lower > upper: raise ValueError('lower must be <= to upper') if lower == upper is not None: return lower if lower is not None and value < lower: value = lower elif upper is not None and value > upper: value = upper return value
I'm having a hard time understanding this line: if lower == upper is not None: As near as I can tell, `upper is not None` will be either True or False, meaning the condition will only ever be True if `lower` is also either True or False, and since I would not expect `lower` to ever be True or False, I expect this condition to always fail. Am I missing something? -- ~Ethan~
On 09.07.20 21:04, Ethan Furman wrote:
On 07/03/2020 05:03 PM, Steven D'Aprano wrote:
def clamp(value, lower, upper): """Clamp value to the closed interval lower...upper.
The limits lower and upper can be set to None to mean -∞ and +∞ respectively. """ if not (lower is None or upper is None): if lower > upper: raise ValueError('lower must be <= to upper') if lower == upper is not None: return lower if lower is not None and value < lower: value = lower elif upper is not None and value > upper: value = upper return value
I'm having a hard time understanding this line:
if lower == upper is not None:
As near as I can tell, `upper is not None` will be either True or False, meaning the condition will only ever be True if `lower` is also either True or False, and since I would not expect `lower` to ever be True or False, I expect this condition to always fail. Am I missing something?
It's operator chaining and shorthand notation for (https://docs.python.org/3/reference/expressions.html#comparisons) if (lower == upper) and upper is not None:
On Thu, Jul 9, 2020, at 15:32, Dominik Vilsmeier wrote:
On 09.07.20 21:04, Ethan Furman wrote:
I'm having a hard time understanding this line:
if lower == upper is not None:
As near as I can tell, `upper is not None` will be either True or False, meaning the condition will only ever be True if `lower` is also either True or False, and since I would not expect `lower` to ever be True or False, I expect this condition to always fail. Am I missing something?
It's operator chaining and shorthand notation for (https://docs.python.org/3/reference/expressions.html#comparisons)
if (lower == upper) and upper is not None:
If PEP-8 does not currently forbid using the shorthand notation in cases other than relational/equality operators in the same general direction [e.g. A > B == C >= D] or like equivalence operators [E is F is G; H == I == J], I think it should.
On 9 Jul 2020, at 21:04, Ethan Furman <ethan@stoneleaf.us> wrote:
On 07/03/2020 05:03 PM, Steven D'Aprano wrote:
def clamp(value, lower, upper): """Clamp value to the closed interval lower...upper. The limits lower and upper can be set to None to mean -∞ and +∞ respectively. """ if not (lower is None or upper is None): if lower > upper: raise ValueError('lower must be <= to upper') if lower == upper is not None: return lower if lower is not None and value < lower: value = lower elif upper is not None and value > upper: value = upper return value
I'm having a hard time understanding this line:
if lower == upper is not None:
As near as I can tell, `upper is not None` will be either True or False, meaning the condition will only ever be True if `lower` is also either True or False, and since I would not expect `lower` to ever be True or False, I expect this condition to always fail. Am I missing something?
This uses comparison chaining and is equivalent to “lower == upper and upper is not None”. I don’t like this particular style, I had to read this a couple of times to get it. Ronald — Twitter / micro.blog: @ronaldoussoren Blog: https://blog.ronaldoussoren.net/
participants (27)
-
2QdxY4RzWzUUiLuE@potatochowder.com
-
Ben Rudiak-Gould
-
Bruce Leban
-
Chris Angelico
-
Christopher Barker
-
Dan Sommers
-
David Mertz
-
Dominik Vilsmeier
-
Eric Fahlgren
-
Ethan Furman
-
Federico Salerno
-
Greg Ewing
-
Henk-Jaap Wagenaar
-
Jeff Allen
-
Jonathan Fine
-
Jonathan Goble
-
Matthew Einhorn
-
MRAB
-
Piper Thunstrom
-
Random832
-
Richard Damon
-
Rob Cliffe
-
Ronald Oussoren
-
Soni L.
-
Stephen J. Turnbull
-
Steven D'Aprano
-
TCPhone93@gmail.com