It's common to want to clip (or clamp) a number to a range. This feature is commonly needed for both floating point numbers and integers:
http://stackoverflow.com/questions/9775731/clamping-floating-numbers-in-pyth... http://stackoverflow.com/questions/4092528/how-to-clamp-an-integer-to-some-r...
There are a few approaches:
* use a couple ternary operators (e.g. https://github.com/scipy/scipy/pull/5944/files line 98, which generated a lot of discussion) * use a min/max construction, * call sorted on a list of the three numbers and pick out the first, or * use numpy.clip.
Am I right that there is no *obvious* way to do this? If so, I suggest adding math.clip (or math.clamp) to the standard library that has the meaning:
def clip(number, lower, upper): return lower if number < lower else upper if number > upper else number
This would work for non-numeric types so long as the non-numeric types support comparison. It might also be worth adding
assert lower < upper
to catch some bugs.
Best,
Neil
Rather:
assert lower <= upper
And apologies if this has been requested before. My search turned up nothing.
On Saturday, July 30, 2016 at 5:57:53 PM UTC-4, Neil Girdhar wrote:
It's common to want to clip (or clamp) a number to a range. This feature is commonly needed for both floating point numbers and integers:
http://stackoverflow.com/questions/9775731/clamping-floating-numbers-in-pyth...
http://stackoverflow.com/questions/4092528/how-to-clamp-an-integer-to-some-r...
There are a few approaches:
- use a couple ternary operators (e.g.
https://github.com/scipy/scipy/pull/5944/files line 98, which generated a lot of discussion)
- use a min/max construction,
- call sorted on a list of the three numbers and pick out the first, or
- use numpy.clip.
Am I right that there is no *obvious* way to do this? If so, I suggest adding math.clip (or math.clamp) to the standard library that has the meaning:
def clip(number, lower, upper): return lower if number < lower else upper if number > upper else number
This would work for non-numeric types so long as the non-numeric types support comparison. It might also be worth adding
assert lower < upper
to catch some bugs.
Best,
Neil
Is there some special subtlety or edge case where a hand rolled function will go wrong? I like the SO version spelled like this (a little fleshed out):
def clamp(val, min_val=None, max_val=None): min_val = val if min_val is None else min_val max_val = val if max_val is None else max_val assert min_val <= max_val return max(min(val , max_val), min_val)
On Sat, Jul 30, 2016 at 2:57 PM, Neil Girdhar mistersheik@gmail.com wrote:
It's common to want to clip (or clamp) a number to a range. This feature is commonly needed for both floating point numbers and integers:
http://stackoverflow.com/questions/9775731/clamping-floating-numbers-in-pyth...
http://stackoverflow.com/questions/4092528/how-to-clamp-an-integer-to-some-r...
There are a few approaches:
- use a couple ternary operators (e.g.
https://github.com/scipy/scipy/pull/5944/files line 98, which generated a lot of discussion)
- use a min/max construction,
- call sorted on a list of the three numbers and pick out the first, or
- use numpy.clip.
Am I right that there is no *obvious* way to do this? If so, I suggest adding math.clip (or math.clamp) to the standard library that has the meaning:
def clip(number, lower, upper): return lower if number < lower else upper if number > upper else number
This would work for non-numeric types so long as the non-numeric types support comparison. It might also be worth adding
assert lower < upper
to catch some bugs.
Best,
Neil
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sat, Jul 30, 2016 at 09:41:07PM -0700, David Mertz wrote:
Is there some special subtlety or edge case where a hand rolled function will go wrong?
Depends on the person doing the hand rolling :-)
I'm not sure how your version would work with NANs, and I haven't bothered to try it to find out, but I like this version:
def clamp(value, lower=None, upper=None): """Clamp value to the closed interval lower...upper.
The limits lower and upper can be set to None to mean -∞ and +∞ respectively. """ if not (lower is None or upper is None): if lower > upper: raise ValueError('lower must be <= upper') if lower is not None and value < lower: value = lower elif upper is not None and value > upper: value = upper return value
which does support NANs for the value being clamped. (It returns the NAN unchanged.) It also avoids the expense of function calls to min and max, and will check that the lower and upper bounds are in the right order even if you run with assertions turned off.
I don't think I've missed any cases.
On Sun, Jul 31, 2016 at 2:19 AM, Steven D'Aprano steve@pearwood.info wrote:
I'm not sure how your version would work with NANs, and I haven't bothered to try it to find out, but I like this version:
Answer:
py> min(10, float('nan')) 10
On Sun, Jul 31, 2016 at 7:10 AM, Ian Kelly ian.g.kelly@gmail.com wrote:
On Sun, Jul 31, 2016 at 2:19 AM, Steven D'Aprano steve@pearwood.info wrote:
I'm not sure how your version would work with NANs, and I haven't bothered to try it to find out, but I like this version:
Answer:
py> min(10, float('nan')) 10
Ah, but the original passed the value before the boundary.
py> min(float('nan'), 10) nan py> max(_, 0) nan
So it works, but it's somewhat fragile because it depends on the order of the arguments to min and max.
I dislike this API. What's the point of calling clamp(x)? clamp(b, a) is min(a, b) and clamp(a, max_val=b) is just max(a, b). My point is that all parameters must be mandatory.
Victor Le 31 juil. 2016 6:41 AM, "David Mertz" mertz@gnosis.cx a écrit :
Is there some special subtlety or edge case where a hand rolled function will go wrong? I like the SO version spelled like this (a little fleshed out):
def clamp(val, min_val=None, max_val=None): min_val = val if min_val is None else min_val max_val = val if max_val is None else max_val assert min_val <= max_val return max(min(val , max_val), min_val)
On Sat, Jul 30, 2016 at 2:57 PM, Neil Girdhar mistersheik@gmail.com wrote:
It's common to want to clip (or clamp) a number to a range. This feature is commonly needed for both floating point numbers and integers:
http://stackoverflow.com/questions/9775731/clamping-floating-numbers-in-pyth...
http://stackoverflow.com/questions/4092528/how-to-clamp-an-integer-to-some-r...
There are a few approaches:
- use a couple ternary operators (e.g.
https://github.com/scipy/scipy/pull/5944/files line 98, which generated a lot of discussion)
- use a min/max construction,
- call sorted on a list of the three numbers and pick out the first, or
- use numpy.clip.
Am I right that there is no *obvious* way to do this? If so, I suggest adding math.clip (or math.clamp) to the standard library that has the meaning:
def clip(number, lower, upper): return lower if number < lower else upper if number > upper else number
This would work for non-numeric types so long as the non-numeric types support comparison. It might also be worth adding
assert lower < upper
to catch some bugs.
Best,
Neil
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
-- Keeping medicines from the bloodstreams of the sick; food from the bellies of the hungry; books from the hands of the uneducated; technology from the underdeveloped; and putting advocates of freedom in prisons. Intellectual property is to the 21st century what the slave trade was to the 16th.
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Sun, Jul 31, 2016 at 12:38 PM, Victor Stinner victor.stinner@gmail.com wrote:
I dislike this API. What's the point of calling clamp(x)? clamp(b, a) is min(a, b) and clamp(a, max_val=b) is just max(a, b). My point is that all parameters must be mandatory.
Fair enough. I was envisioning a usage like:
bottom, top = None, None # ... some code that might derive values for bottom and/or top x = clamp(x, bottom, top)
But this also lets us use the same signature for, e.g.:
y = clamp(y, max_val=100)
Still, my point wasn't to argue for a signature or implementation, but just opining that the utility is easy enough to write that users can put it in their own library/code.
Le 31 juil. 2016 6:41 AM, "David Mertz" mertz@gnosis.cx a écrit :
Is there some special subtlety or edge case where a hand rolled function will go wrong? I like the SO version spelled like this (a little fleshed out):
def clamp(val, min_val=None, max_val=None): min_val = val if min_val is None else min_val max_val = val if max_val is None else max_val assert min_val <= max_val return max(min(val , max_val), min_val)
On Sat, Jul 30, 2016 at 2:57 PM, Neil Girdhar mistersheik@gmail.com wrote:
Am I right that there is no *obvious* way to do this? If so, I suggest adding math.clip (or math.clamp) to the standard library that has the meaning:
On Sun, Jul 31, 2016 at 09:38:44PM +0200, Victor Stinner wrote:
I dislike this API. What's the point of calling clamp(x)? clamp(b, a) is min(a, b) and clamp(a, max_val=b) is just max(a, b).
You have that the wrong way around. If you supply a lower-bounds, you must take the max(), not the min(). If you supply a upper-bounds, you take the min(), not the max(). It's easy to get wrong.
My point is that all parameters must be mandatory.
I don't care too much whether the parameters are mandatory or have defaults, so long as it is *possible* to pass something for the lower and upper bounds which mean "unbounded". There are four obvious alternatives (well three obvious ones and one surprising one):
(1) Explicitly pass -INFINITY or +INFINITY as needed; but which infinity, float or Decimal? If you pass the wrong one, you may have to pay the cost of converting your values to float/Decimal, which could end up expensive if you have a lot of them.
(2) Pass a NAN as the bounds. With my implementation, that actually works! But it's a surprising accident of implementation, it feels wrong and looks weird, and again, it may require converting the values to float/Decimal.
(3) Use some special Infimum and Supremum objects which are smaller than, and greater than, every other value. But we don't have such objects, so you'd need to create your own.
(4) Use None as a placeholder for "no limit". That's my preferred option.
Of course, even if None is accepted as "no limit", the caller can still explicitly provide an infinity if they prefer.
As I said, I don't particularly care whether the lower and upper bounds have default values. But I think it is useful and elegant to accept None (as well as infinity) to mean "no limit".
Something to keep in mind:
the math module is written in C, and will remain that way for the time being (see recent discussion on, I think, this list and also the discussion when we added math.isclose()
which means it will be for floats only.
My first thought is that not every one line function needs to be in the standard library. However, as this thread shows, there are some complications to be considered, so maybe it does make sense to have them hashed out.
Regarding NaN:
In [4]: nan = float('nan')
In [6]: nan > 5
Out[6]: False
In [7]: 5 > nan
Out[7]: False
This follows the IEEE spec -- so the only correct result from
clip(x, float('nan')) is NaN.
Steven D'Aprano wrote:
I don't care too much whether the parameters are mandatory or have
defaults, so long as it is *possible* to pass something for the lower and upper bounds which mean "unbounded".
I think the point was that if one of the liimts in unbounded, then you can jsut use min or max...
though I think I agree -- you may have code where the limits are sometimes unbounded, and sometimes not -- nice to have a way to have only one code path.
(1) Explicitly pass -INFINITY or +INFINITY as needed; but which
that's it then.
infinity, float or Decimal? If you pass the wrong one, you may have to pay the cost of converting your values to float/Decimal, which could end up expensive if you have a lot of them.
well, as above, if it's in the math module, it's only float.... you could add one ot the Decimal module, too, I suppose.
(2) Pass a NAN as the bounds. With my implementation, that actually works! But it's a surprising accident of implementation, it feels wrong and looks weird,
and violates IEEE754 -- don't do that.
(3) Use some special Infimum and Supremum objects which are smaller than, and greater than, every other value. But we don't have such objects, so you'd need to create your own.
that's what float('inf') already is -- let's use them.
(4) Use None as a placeholder for "no limit". That's my preferred option.
reasonable enough -- and would make the API a bit easier -- both for matching different types, and because there is no literal or pre-existing object for Inf.
-Chris
On Sun, Jul 31, 2016 at 7:47 PM, Steven D'Aprano steve@pearwood.info wrote:
On Sun, Jul 31, 2016 at 09:38:44PM +0200, Victor Stinner wrote:
I dislike this API. What's the point of calling clamp(x)? clamp(b, a) is min(a, b) and clamp(a, max_val=b) is just max(a, b).
You have that the wrong way around. If you supply a lower-bounds, you must take the max(), not the min(). If you supply a upper-bounds, you take the min(), not the max(). It's easy to get wrong.
My point is that all parameters must be mandatory.
I don't care too much whether the parameters are mandatory or have defaults, so long as it is *possible* to pass something for the lower and upper bounds which mean "unbounded". There are four obvious alternatives (well three obvious ones and one surprising one):
(1) Explicitly pass -INFINITY or +INFINITY as needed; but which infinity, float or Decimal? If you pass the wrong one, you may have to pay the cost of converting your values to float/Decimal, which could end up expensive if you have a lot of them.
(2) Pass a NAN as the bounds. With my implementation, that actually works! But it's a surprising accident of implementation, it feels wrong and looks weird, and again, it may require converting the values to float/Decimal.
(3) Use some special Infimum and Supremum objects which are smaller than, and greater than, every other value. But we don't have such objects, so you'd need to create your own.
(4) Use None as a placeholder for "no limit". That's my preferred option.
Of course, even if None is accepted as "no limit", the caller can still explicitly provide an infinity if they prefer.
As I said, I don't particularly care whether the lower and upper bounds have default values. But I think it is useful and elegant to accept None (as well as infinity) to mean "no limit".
-- Steve _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Mon, Aug 01, 2016 at 12:00:11PM -0700, Chris Barker wrote:
Something to keep in mind:
the math module is written in C, and will remain that way for the time being (see recent discussion on, I think, this list and also the discussion when we added math.isclose()
which means it will be for floats only.
Not necessarily.
py> import math py> math.factorial(100) 93326215443944152681699238856266700490715968264381621468592963895217599993229915608941463976156518286253697920827223758251185210916864000000000000000000000000
Not a float :-)
It means that this clamp() function would have to be implemented in C. It *doesn't* mean that it will have to convert its arguments to floats, or reject non-float arguments.
As my implementation shows, this should work with any ordered numeric type if clamp() calls the Python < and > operators (i.e. the __lt__ and __gt__ dunders). Let the objects themselves do any numeric conversions *if necessary*, there's no need for clamp() to convert the arguments to floats and call the native C double < and > operators.
(I presume that there's a way to call Python operators from C code.)
My first thought is that not every one line function needs to be in the standard library. However, as this thread shows, there are some complications to be considered, so maybe it does make sense to have them hashed out.
Indeed.
Regarding NaN:
In [4]: nan = float('nan') In [6]: nan > 5 Out[6]: False In [7]: 5 > nan Out[7]: False
NANs are *unordered* values: they are neither greater than, nor less than, any other value.
This follows the IEEE spec -- so the only correct result from
clip(x, float('nan')) is NaN.
I don't agree that this is the "only correct result".
We only clamp the value if it is less than the lower bound, or greater than the upper bound. Otherwise we leave it untouched. So, given:
clamp(x, lower, upper)
we say:
if x < lower: x = lower elif x > upper: x = upper
If lower or upper are NANs, then neither condition will ever be true, and x will never be clamped to a NAN (unless it is already a NAN).
That's why I said that it was an accident of implementation that passing a NAN as one of the lower or upper bounds will be equivalent to setting the bounds to minus/plus infinity: the value will never be less than NAN, or greater than NAN.
I suppose we could rule that case out: if either bound is a NAN, raise an exception. But that will require a conversion to float, which may fail. I'd rather just document that passing NANs as bounds will lead to implementation-specific behaviour that you cannot rely on it. If you want to specify an unbounded limit, pass None or an infinity with the right sign.
Steven D'Aprano wrote:
I don't care too much whether the parameters are mandatory or have defaults, so long as it is *possible* to pass something for the lower and upper bounds which mean "unbounded".
I think the point was that if one of the liimts in unbounded, then you can jsut use min or max...
though I think I agree -- you may have code where the limits are sometimes unbounded, and sometimes not -- nice to have a way to have only one code path.
That's exactly my thinking. The last thing you want to do is to inspect the bounds, then decide whether you need to call min(), max() or clamp(). Not only is it a pain, but as Victor inadvertently showed, it's easy to get mixed up and call the wrong function.
(1) Explicitly pass -INFINITY or +INFINITY as needed; but which
that's it then.
infinity, float or Decimal? If you pass the wrong one, you may have to pay the cost of converting your values to float/Decimal, which could end up expensive if you have a lot of them.
well, as above, if it's in the math module, it's only float.... you could add one ot the Decimal module, too, I suppose.
I'm pretty sure that a C implementation can be type agnostic and simply rely on the Python < and > operators.
(2) Pass a NAN as the bounds. With my implementation, that actually works! But it's a surprising accident of implementation, it feels wrong and looks weird,
and violates IEEE754 -- don't do that.
What part of IEEE-754 do you think it violates? I don't think it violates anything. But I agree, don't do that. If you do, you'll get whatever the implementation happens to do, no promises or guarantees.
[...]
(4) Use None as a placeholder for "no limit". That's my preferred option.
reasonable enough -- and would make the API a bit easier -- both for matching different types, and because there is no literal or pre-existing object for Inf.
I agree with that reasoning.
On Mon, Aug 1, 2016 at 4:10 PM, Steven D'Aprano steve@pearwood.info wrote:
(I presume that there's a way to call Python operators from C code.)
Yes, see https://docs.python.org/3.5/c-api/object.html#c.PyObject_RichCompareBool.
On Mon, Aug 1, 2016 at 1:10 PM, Steven D'Aprano steve@pearwood.info wrote:
It means that this clamp() function would have to be implemented in C. It *doesn't* mean that it will have to convert its arguments to floats, or reject non-float arguments.
sure -- though I hope it would special-case and be most efficient for floats. However, for the most part, the math module IS all about floats -- though I don't suppose there is any harm in allowing other types.
This follows the IEEE spec -- so the only correct result from
clip(x, float('nan')) is NaN.
I don't agree that this is the "only correct result".
I don't think IEE754 says anything about a "clip" function, but a NaN is neither greater than, less than, nor equal to any value -- so when you ask if, for example, for the input value if it is less than or equal to NaN, but NaN if NaN is great then the input, there is no answer -- the spirit of IEEE NaN handling leads to NaN being the only correct result.
Note that I'm pretty sure that min() and max() are wrong here, too.
That's why I said that it was an accident of implementation that passing
a NAN as one of the lower or upper bounds will be equivalent to setting the bounds to minus/plus infinity:
exactly -- and we should not have the results be an accident of implimentation -- but rather be thougth out, and follow IEE754 intent.
I suppose we could rule that case out: if either bound is a NAN, raise
an exception. But that will require a conversion to float, which may fail. I'd rather just document that passing NANs as bounds will lead to implementation-specific behaviour that you cannot rely on it.
why not say that passing NaNs as bounds will result in NaN result? At least if the value is a float -- if it's anything else than maybe an exception, as NaN does not make sense for anything else anyway.
If you want to specify an unbounded limit, pass None or an infinity with the right sign.
exactly -- that's there, so why not let NaN be NaN?
-CHB
On Tue, Aug 2, 2016 at 1:02 PM, Chris Barker chris.barker@noaa.gov wrote:
I don't think IEE754 says anything about a "clip" function, but a NaN is neither greater than, less than, nor equal to any value -- so when you ask if, for example, for the input value if it is less than or equal to NaN, but NaN if NaN is great then the input, there is no answer -- the spirit of IEEE NaN handling leads to NaN being the only correct result.
Note that I'm pretty sure that min() and max() are wrong here, too.
Builtin max is wrong
nan = float('nan') max(nan, 1)
nan
max(1, nan)
1
but numpy's maximum gets it right:
numpy.maximum(nan, 1)
nan
numpy.maximum(1, nan)
nan
And here is how numpy defines clip:
numpy.clip(nan, 1, 2)
nan
numpy.clip(1, 1, nan)
1.0
numpy.clip(1, nan, nan)
1.0
I am not sure I like the last two results.
On Tue, Aug 02, 2016 at 10:02:11AM -0700, Chris Barker wrote:
I don't think IEE754 says anything about a "clip" function, but a NaN is neither greater than, less than, nor equal to any value -- so when you ask if, for example, for the input value if it is less than or equal to NaN,
Then the answer MUST be False. That's specified by IEEE-754.
but NaN if NaN is great then the input, there is no answer -- the spirit of IEEE NaN handling leads to NaN being the only correct result.
Incorrect. The IEEE standard actually does specify the behaviour of comparisons with NANs, and Python does it correctly. See also the Decimal module.
Note that I'm pretty sure that min() and max() are wrong here, too.
In a later update to the standard, IEEE-854 if I remember correctly, there's a whole series of extra comparisons which will return NAN given a NAN argument, including alternate versions of max() and min(). I can't remember which is in 754 and which in 854, but there are two versions of each:
min #1 (x, NAN) must return x min #2 (x, NAN) must return NAN
and same for max.
In any case, clamping is based of < and > comparisons, which are well-specified by IEEE 754 even when NANs are included:
# pseudo-code for op in ( < <= == >= > ): assert all(x op NAN is False for all x)
assert all(x != NAN is True for all x)
If you want the comparisons to return NANs, you're looking at different comparisons from a different standard.
That's why I said that it was an accident of implementation that passing a NAN as one of the lower or upper bounds will be equivalent to setting the bounds to minus/plus infinity:
exactly -- and we should not have the results be an accident of implimentation -- but rather be thougth out, and follow IEE754 intent.
There are lots of places in Python where the behaviour is an accident of implementation. I don't think that this clamp() function should convert the arguments to floats (which may fail, or lose precision) just to prevent the caller passing a NAN as one of the bounds. Just document the fact that you shouldn't use NANs as lower/upper bounds.
why not say that passing NaNs as bounds will result in NaN result?
Because that means that EVERY call to clamp() has to convert both bounds to float and see if they are NANs. If you're calling this in a loop:
for x in range(1000): print(clamp(x, lower, upper))
each bound gets converted to float and checked for NAN-ness 1000 times. This is a total waste of effort for 99.999% of uses, where the bounds will be numbers.
At least if the value is a float -- if it's anything else than maybe an exception, as NaN does not make sense for anything else anyway.
Of course it does: clamp() can change the result type, so it could return a NAN. But why would you bother?
clamp(Fraction(1, 2), 0.75, 100) returns 0.75; clamp(100, 0.0, 50.0) returns 50.0;
If you want to specify an unbounded limit, pass None or an infinity with the right sign.
exactly -- that's there, so why not let NaN be NaN?
Because it is unnecessary.
If you want a NAN-enforcing version of clamp(), it is *easy* to write a wrapper:
def clamp_nan(value, lower, upper): if math.isnan(lower) or math.isnan(upper): return float('nan') return clamp(value, lower, upper)
A nice, easy four-line function. But if clamp() does that check, it's hard to avoid the checks when you don't want them. I know my bounds aren't NANs, and I'm calling clamp() in big loop. Don't check them a million times, they're never going to be NANs, just do the comparisons.
It's easy to write a stricter function if you need it. It's hard to write a less strict function when you don't want the strictness.
On Tue, Aug 2, 2016, at 14:22, Steven D'Aprano wrote:
In any case, clamping is based of < and > comparisons, which are well-specified by IEEE 754 even when NANs are included:
Sure, but what the standard doesn't say is exactly what sequence of comparisons is entailed by a clamp function.
def clamp(x, a, b): if x < a: return a else: if x > b: return b else: return x
def clamp(x, a, b): if a <= x: if x <= b: return x else: return b else: return a
There are, technically, eight possible naive implementations, varying along three axes: - which of a or b is compared first - x < a (or a > x) vs x >= a (or a <= x) - x > b (or b < x) vs x <= b (or b >= x)
And then there are implementations that may do more than two comparisons.
def clamp(x, a, b): if a <= x <= b: return x elif x < a: return a else: return b
All such functions are equivalent if {a, b, x} is a set over which the relational operators define a total ordering, and a <= b. However, this is not the case if NaN is used for any of the arguments.
On Tue, Aug 2, 2016 at 11:45 AM, Random832 random832@fastmail.com wrote:
Sure, but what the standard doesn't say is exactly what sequence of comparisons is entailed by a clamp function.
def clamp(x, a, b): if x < a: return a else: if x > b: return b else: return x
def clamp(x, a, b): if a <= x: if x <= b: return x else: return b else: return a
There are, technically, eight possible naive implementations, varying along three axes:
Exactly-- I thought this was self evident, but apparently not -- thanks for spelling it out.
All such functions are equivalent if {a, b, x} is a set over which the relational operators define a total ordering, and a <= b. However, this is not the case if NaN is used for any of the arguments.
Exactly again -- NaN's are kind of a pain :-(
As for the convert to floats issue -- correctness is more important than performance, and performance is probably most important for the special case of all floats. (or floats and integers, I suppose) -- i'm sure we can find a solution. LIkely something iike the second option above would work fine, and also work for anything with an ordinary total ordering.
-Chris
It really doesn't make sense to me that a clamp() function would *limit to* a NaN. I realize one can write various implementations that act differently here, but the principle of least surprise seems violated by letting a NaN be an actual end point IMO. numpy.clip() seems to behave just right, FWIW.
If I'm asking for a value that is "not more than (less than) my bounds" I don't want all my values to become NaN's. by virtue of that. A regular number is not affirmatively outside the bounds of a NaN in a commonsense way. It's just not comparable to it at all. So for that purpose—no determinable bound—a NaN amounts to the same thing as an Inf (but just for this purpose, they are very different in other contexts).
I guess I think of clamping as "pulling in the values that are *definitely* outside a range." Nothing is definite that way with a NaN. So 'clamp(nan, -1, 1)' is conceptually 'nan' because the unknown value might or might not be "really" outside the range (but not definitely). And likewise 'clamp(X, nan, nan)' has to be X because we can't *know* X is outside the range.
A NaN, conceptually, is a value that *might* exist, if only we knew more and could determine it.... but as is, it's just "unknown."
On Tue, Aug 2, 2016 at 4:50 PM, Chris Barker chris.barker@noaa.gov wrote:
On Tue, Aug 2, 2016 at 11:45 AM, Random832 random832@fastmail.com wrote:
Sure, but what the standard doesn't say is exactly what sequence of comparisons is entailed by a clamp function.
def clamp(x, a, b): if x < a: return a else: if x > b: return b else: return x
def clamp(x, a, b): if a <= x: if x <= b: return x else: return b else: return a
There are, technically, eight possible naive implementations, varying along three axes:
Exactly-- I thought this was self evident, but apparently not -- thanks for spelling it out.
All such functions are equivalent if {a, b, x} is a set over which the relational operators define a total ordering, and a <= b. However, this is not the case if NaN is used for any of the arguments.
Exactly again -- NaN's are kind of a pain :-(
As for the convert to floats issue -- correctness is more important than performance, and performance is probably most important for the special case of all floats. (or floats and integers, I suppose) -- i'm sure we can find a solution. LIkely something iike the second option above would work fine, and also work for anything with an ordinary total ordering.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
David Mertz wrote:
It really doesn't make sense to me that a clamp() function would *limit to* a NaN.
Keep in mind that the NaNs involved have probably arisen from some other computation that went wrong, and that the purpose of the whole NaN system is to propagate an indication of that wrongness so that it's evident in the final result.
So here's how I see it:
clamp(NaN, y, z) is asking "Is an unknown number between y and z?" The answer to that is not known, so the result should be NaN.
clamp(x, y, NaN) is asking "Is x between y and an unknown number?" If x > y, the answer to that is not known, so the result should be NaN.
If x < y, you might argue that the result should be y. But consider clamp(x, 2, 1). You're asking it to limit x to a value not less than 2 and not greater than 1. There's no such number, so arguably the result should be NaN.
If you accept that, then clamp(x, y, NaN) should be NaN in all cases, since we don't know that the upper bound isn't less than the lower bound.
So in summary, I think it should be:
clamp(NaN, y, z) --> NaN clamp(x, NaN, z) --> NaN clamp(x, y, NaN) --> NaN clamp(x, y, z) --> NaN if z < y
On Tue, Aug 2, 2016 at 2:56 PM, David Mertz mertz@gnosis.cx wrote:
It really doesn't make sense to me that a clamp() function would *limit to* a NaN. I realize one can write various implementations that act differently here, but the principle of least surprise seems violated by letting a NaN be an actual end point IMO.
NaN's rarely follow the principle of least surprise :-)
In [7]: float('nan') == float('nan') Out[7]: False
and you are not letting it be an end point -- you are returning Not a Number -- i.e. I have no idea what this value should be.
If I'm asking for a value that is "not more than (less than) my bounds"
If your bounds are NaN, then you cannot know if you value is within those bounds -- that's how NaN works.
A NaN, conceptually, is a value that *might* exist, if only we knew more
and could determine it.... but as is, it's just "unknown."
NaN is often used for missing values and the like, but that's now quite what it means -- it means just what it says, NOT a number. You know nothing about it.
If someone is passing a NaN in for a bound, then they are passing in garbage, essentially -- "I have no idea what my bounds are" so garbage is what they should get back -- "I have no idea what your clamped values are".
The reality is that NaNs tend to propagate through calculations -- once one gets introduced, you are very, very likely to get NaN as a result -- this won't change that.
If you want unbounded, then don't use this function :-) -- or pass in inf or -inf -- that's what they are for. And they work for integers, too:
float('inf') > 9999999999999999999999999999999 Out[13]: True
If they don't work for other numeric types, then that should be fixed in those types... One final thought:
How would a NaN find it's way into this function? two ways:
1) the user specified it, thinking it might mean "unlimited" -- well don't do that! It will fail the first test.
2) the limit was calculated in some way that resulted in a NaN -- well, in this case, they really have no idea what that limit should be -- the NaN should absolutely be propagated, like it is for any other arithmetic operation.
-CHB
PS: numpy may be a good place to look for precedent, but unfortunately, it is not necessarily a good place to look for carefully thought out implementations -- much of it was put in there when someone needed it, without much discussion at all. I'm sure that NaN's behave the way they do in numpy.clip() because of how it happens to be implemented, not because anyone carefully thought it out.
On Tue, Aug 2, 2016 at 7:35 PM, Chris Barker chris.barker@noaa.gov wrote:
If you want unbounded, then don't use this function :-) -- or pass in inf or -inf -- that's what they are for. And they work ...
+inf :-)
On 03/08/2016 00:23, Greg Ewing wrote:
David Mertz wrote:
It really doesn't make sense to me that a clamp() function would *limit to* a NaN.
Keep in mind that the NaNs involved have probably arisen from some other computation that went wrong, and that the purpose of the whole NaN system is to propagate an indication of that wrongness so that it's evident in the final result.
So here's how I see it:
clamp(NaN, y, z) is asking "Is an unknown number between y and z?" The answer to that is not known, so the result should be NaN.
clamp(x, y, NaN) is asking "Is x between y and an unknown number?" If x > y, the answer to that is not known, so the result should be NaN.
+1 so far
If x < y, you might argue that the result should be y. But consider clamp(x, 2, 1). You're asking it to limit x to a value not less than 2 and not greater than 1. There's no such number, so arguably the result should be NaN.
I think clamp(x,2,1) should raise ValueError. It's asking for something impossible.
If you accept that, then clamp(x, y, NaN) should be NaN in all cases, since we don't know that the upper bound isn't less than the lower bound.
+0.8. Returning y when x<y might suit some applications better, but "resist the temptation to guess". Rob Cliffe
So in summary, I think it should be:
clamp(NaN, y, z) --> NaN clamp(x, NaN, z) --> NaN clamp(x, y, NaN) --> NaN clamp(x, y, z) --> NaN if z < y
On 3 August 2016 at 15:36, Rob Cliffe rob.cliffe@btinternet.com wrote:
If x < y, you might argue that the result should be y. But consider clamp(x, 2, 1). You're asking it to limit x to a value not less than 2 and not greater than 1. There's no such number, so arguably the result should be NaN.
I think clamp(x,2,1) should raise ValueError. It's asking for something impossible.
Agreed.
If you accept that, then clamp(x, y, NaN) should be NaN in all cases, since we don't know that the upper bound isn't less than the lower bound.
+0.8. Returning y when x<y might suit some applications better, but "resist the temptation to guess".
clamp(val, lo, hi) should raise ValueError if either lo or hi is NaN, for the same reason (lo < hi doesn't hold, in this case because the values are incomparable).
Paul
clamp(val, lo, hi) should raise ValueError if either lo or hi is NaN,
NaN and ValueError are kind of redundant. NaN exists as a way to (kind of) propagate value errors within the hardware floating point machinery.
So it _might_ make sense for Python to raise ValueError wherever a NaN shows up, but as long as, e.g.
X * NaN
Returns NaN, so should clamp() or clip() or whatever it's called.
Greg spelled it all out, but in short:
If a NaN is passed in anywhere, you get a NaN back.
One could argue that:
clamp(NaN, x,x)
Is clearly defined as x. But that would require special casing, and, "equality" is a bit of an ephemeral concept with floats, so better to return NaN.
-CHB
Chris Barker - NOAA Federal wrote:
One could argue that:
clamp(NaN, x,x)
Is clearly defined as x. But that would require special casing, and, "equality" is a bit of an ephemeral concept with floats, so better to return NaN.
Yeah, it would only apply to a vanishingly small part of the possible parameter space, so I don't think it would be worth the bother.
On Tue, Aug 02, 2016 at 04:35:55PM -0700, Chris Barker wrote:
If someone is passing a NaN in for a bound, then they are passing in garbage, essentially -- "I have no idea what my bounds are" so garbage is what they should get back -- "I have no idea what your clamped values are".
The IEEE 754 standard tells us what min(x, NAN) and max(x, NAN) should be: in both cases it is x.
https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max
Quote: In order to support operations such as windowing in which a NaN input should be quietly replaced with one of the end points, min and max are defined to select a number, x, in preference to a quiet NaN:
min(x,NaN) = min(NaN,x) = x max(x,NaN) = max(NaN,x) = x
According to Wikipedia, this behaviour was chosen specifically for the use-case we are discussing: windowing or clamping.
See also page 9 of Professor William Kahan's notes here:
https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
Quote:
For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree.
It's okay to disagree and want "NAN poisoning" behaviour. If we define clamp(x, NAN, NAN) as x, as I have been arguing, then you can *easily* get the behaviour you want with a simple wrapper:
def clamp(x, lower, upper): if math.isnan(lower) or math.isnan(upper): # raise or return NAN else: return math.clamp(x, lower, upper)
Apart from the cost of one extra function call, which isn't too bad, this is no more expensive than what you are suggesting *everyone* should pay (two calls to math.isnan). So you are no worse off under my proposal: just define your own helper function, and you get the behaviour you want. We all win.
But if the standard clamp() function has the behaviour you want, violating IEEE-754, then you are forcing it on *everyone*, whether they want it or not. I don't want it, and I cannot use it. There's nothing I can do except re-implement clamp() from scratch and ignore the one in the math library.
As you propose it, clamp() is no use to me: it unnecesarily converts the bounds to float, which may raise an exception. If I use it in a loop, it unnecessarily checks to see if the bounds are NANs, over and over and over again, even when I know that they aren't. It does the wrong thing (according to my needs, according to Professor Kahan, and according to the current revision of IEEE-754) if I do happen to pass a NAN as bounds.
Numpy has a "nanmin" which ignores NANs (as specified by IEEE-754), and "amin" which propogates NANs:
http://docs.scipy.org/doc/numpy/reference/generated/numpy.nanmin.html http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html
Similar for "minimum" and "fmin", which return the element-wise minimums.
By the way, there are also POSIX functions fmin and fmax which behave according to the standard:
http://man7.org/linux/man-pages/man3/fmin.3.html http://man7.org/linux/man-pages/man3/fmax.3.html
Julia has a clamp() function, although unfortunately the documentation doesn't say what the behaviour with NANs is:
http://julia.readthedocs.io/en/latest/stdlib/math/#Base.clamp
On Wed, Aug 03, 2016 at 11:23:06AM +1200, Greg Ewing wrote:
David Mertz wrote:
It really doesn't make sense to me that a clamp() function would *limit to* a NaN.
That's what I thought too, at first, but on reading more about the IEEE-754 standard, I've changed my mind. Passing a NAN as bounds can be interpreter as "bounds is missing", i.e. "no bounds".
Keep in mind that the NaNs involved have probably arisen from some other computation that went wrong, and that the purpose of the whole NaN system is to propagate an indication of that wrongness so that it's evident in the final result.
That's not quite right. NANs are allowed to "disappear". In fact, Professor Kahan has specifically written that NANs which cannot diappear out of a calculation are useless:
Were there no way to get rid of NaNs, they would be as useless as Indefinites on CRAYs; as soon as one were encountered, computation would be best stopped rather than continued for an indefinite time to an Indefinite conclusion. That is why some operations upon NaNs must deliver non-NaN results. Which operations?
Page 8, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
He describes some of the conditions under which a NAN might drop out of a calculation. He also says that min(NAN, x) and max(NAN, x) should both return x, which implies that so should clamp(x, NAN, NAN).
So here's how I see it:
clamp(NaN, y, z) is asking "Is an unknown number between y and z?" The answer to that is not known, so the result should be NaN.
I agree, and fortunately that's easily performed without any explicit test for NAN-ness. Given x = float('nan'), neither x < lower nor x > upper will ever be true, no matter what the lower and upper bounds are. So we'll fall through to the default and return x, which is a NAN, as wanted.
clamp(x, y, NaN) is asking "Is x between y and an unknown number?" If x > y, the answer to that is not known, so the result should be NaN.
No, that's not necessarily right. That's one possible interpretation of setting a bounds to NAN. I've seen that referred to as "NAN poisoning", and it is a reasonable thing to ask for. But...
...another interpretion, and one which is closer to the current revision of the IEEE-754 standard, is that clamp(x, NAN, NAN) should treat the NANs as "missing values", i.e. that there is no lower or upper bound. That would be equivalent to specifying infinities as bounds.
If you want a NAN-poisoning version of clamp(), it is easy to build it from a NAN-as-missing-value clamp(). If you start with NAN-poisoning, you can't easily get NANs-as-missing-values. So if we get only one, we should treat NANs as missing values, and let people build the NAN-poisoning version as a wrapper.
If x < y, you might argue that the result should be y. But consider clamp(x, 2, 1). You're asking it to limit x to a value not less than 2 and not greater than 1. There's no such number, so arguably the result should be NaN.
In that case, I would raise ValueError.
So in summary, I think it should be:
clamp(NaN, y, z) --> NaN
Agreed. It couldn't reasonably be anything else.
clamp(x, NaN, z) --> NaN clamp(x, y, NaN) --> NaN
No, both these cases should treat NAN as equivalent to no limit, and clamp x as appropriate. If you want a second, NAN-poisoning clamp(), that's your perogative, but don't force it upon everyone.
clamp(x, y, z) --> NaN if z < y
That's a clear error, and it should raise immediately. I see no advantage to returning NAN in this case.
Think about why you're clamping. It's unlikely to be used just once, for a single calculation. You're likely to be clamping a whole series of values, with a fixed lower and upper bounds. The bounds are unlikely to be known at compile-time, but they aren't going to change from clamping to clamping. Something like this:
lower, upper = get_bounds() for x in values(): y = some_calculation(x) y = clamp(y, lower, upper) do_something_with(y)
is the most likely use-case, I think.
If lower happens to be greater than upper, that's clearly a mistake. Its better to get an exception immediately, rather than run through a million calculations and only then discover that you've ended up with a million NANs. It's okay if you get a few NANs, that simply indicates that one of your x values was a NAN, or a calculation produced a NAN. But if *every* calculation produces a NAN, well, that's a sign of breakage. Hence, better to raise straight away.
When I really need such function, I define it like this:
def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)
Test: min_val <= result <= max_val.
The parameter order is chosen to get something looking like min_val <= value (result in fact) <= max_val.
If you need special handling of NaN, I suggest to add a special version in the math module.
I'm not sure that it's worth it to add such new function to the standard library.
Victor
Le 31 juil. 2016 6:13 AM, "Neil Girdhar" mistersheik@gmail.com a écrit :
It's common to want to clip (or clamp) a number to a range. This feature is commonly needed for both floating point numbers and integers:
http://stackoverflow.com/questions/9775731/clamping- floating-numbers-in-python http://stackoverflow.com/questions/4092528/how-to- clamp-an-integer-to-some-range-in-python
There are a few approaches:
- use a couple ternary operators (e.g. https://github.com/
scipy/scipy/pull/5944/files line 98, which generated a lot of discussion)
- use a min/max construction,
- call sorted on a list of the three numbers and pick out the first, or
- use numpy.clip.
Am I right that there is no *obvious* way to do this? If so, I suggest adding math.clip (or math.clamp) to the standard library that has the meaning:
def clip(number, lower, upper): return lower if number < lower else upper if number > upper else number
This would work for non-numeric types so long as the non-numeric types support comparison. It might also be worth adding
assert lower < upper
to catch some bugs.
Best,
Neil
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Thu, Aug 4, 2016 at 11:48 AM, Victor Stinner victor.stinner@gmail.com wrote:
When I really need such function, I define it like this:
def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)
and your colleague next door defines it like this:
def clamp(min_val, value, max_val): return min(max_val, max(value , min_val))
and a third party library ships
def clamp(min_val, value, max_val): return max(min(max_val, value), min_val)
and combinatorially, there is at least a half-dozen more variations. The behavior of each variant is subtly different from the others. Having this function in stdlib would allow standardizing on one well-documented (and hopefully well-motivated) variant.
On Thu, Aug 4, 2016 at 9:24 AM, Alexander Belopolsky < alexander.belopolsky@gmail.com> wrote:
The behavior of each variant is subtly different from the others. Having this function in stdlib would allow standardizing on one well-documented (and hopefully well-motivated) variant.
exactly -- not every small function should be in the stdlib -- but there is a place for it when there are multiple subtly different way to implement it.
regardless of the outcome of the NaN issue -- I think the "one obvious way to do it" should be the math.clip() (or math.clamp()) function.
-CHB
On Thu, Aug 4, 2016, at 12:24, Alexander Belopolsky wrote:
On Thu, Aug 4, 2016 at 11:48 AM, Victor Stinner victor.stinner@gmail.com wrote:
When I really need such function, I define it like this:
def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)
and your colleague next door defines it like this:
def clamp(min_val, value, max_val): return min(max_val, max(value , min_val))
Ideally min and max should themselves be defined in a way that makes that not an issue (or perhaps only an issue for different-signed zero values)
and a third party library ships
def clamp(min_val, value, max_val): return max(min(max_val, value), min_val)
That one is more of an issue, though AIUI only so when min_val > max_val.
On Thu, Aug 4, 2016 at 5:35 AM, Steven D'Aprano steve@pearwood.info wrote:
The IEEE 754 standard tells us what min(x, NAN) and max(x, NAN) should be: in both cases it is x.
I thought an earlier post said something about a alternatvie min and max? -- but anyway, consisetncy with min and max is a pretty good argument.
Quote:
For instance max{x, y} should deliver the same result as max{y, x} but almost no implementations do that when x is NaN. There are good reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would disagree.
I don't disagree that there are good reason, just that it's the final way to go :-) -- but if Kahan equivocates, then there isn't one way to go :-)
As you propose it, clamp() is no use to me: it unnecessarily converts the
bounds to float, which may raise an exception.
no it doesn't -- that's only one way to implement it. We really should decide on the behaviour we want, and then figure out how to implemt it -- not choose something because it's easier to implement.
there was an earlier post with an implementation that would give the NaN-poising behaviour, but would also work with any total-ordered type as well. not that's thought about possible edge cases.
I think this is it:
def clamp(x, a, b): if a <= x: if x <= b: return x else: return b else: return a
hmm -- doesn't work for x is NaN, but limits are not -- but I'm sure that could be worked out.
In [*32*]: clamp(nan, 0, 100)
Out[*32*]: 0
-CHB
Le 4 août 2016 19:59, "Random832" random832@fastmail.com a écrit :
def clamp(min_val, value, max_val): return min(max_val, max(value , min_val))
Ideally min and max should themselves be defined in a way that makes that not an issue (or perhaps only an issue for different-signed zero values)
There is a generic sum() and a specific math.fsum() function which is more accurate to sum a list of float.
Maybe before starting to talk about clamp(), we should define new math.fmin() and math.fmax() functions?
A suggest to start to write a short PEP as the math.is_close() PEP since there are subtle issues like NaN (float but also Decimal!) and combinations of numerical types (int, float, complex, Decimal, Fraction, numpy scalars like float16, ...).
Maybe a PEP is not needed, I didn't read carefully the thread to check if there is a consensus or not.
I dislike the idea of modifying min() and max() to add special cases for float NaN and decimal NaN.
Which type do you expect for fmax(int, int)? Should it be int or float?
Should fmax(Decimal, float) raise an error, return a float or return a Decimal?
Victor
On Thu, Aug 4, 2016 at 6:20 AM, Steven D'Aprano steve@pearwood.info wrote:
Think about why you're clamping. It's unlikely to be used just once, for a single calculation. You're likely to be clamping a whole series of values, with a fixed lower and upper bounds. The bounds are unlikely to be known at compile-time, but they aren't going to change from clamping to clamping. Something like this:
lower, upper = get_bounds() for x in values(): y = some_calculation(x) y = clamp(y, lower, upper) do_something_with(y)
is the most likely use-case, I think.
I was curious about what the likely cases are in many cases, so I ran a quick sample from a professional project I am working on, and found the following results:
clamping to [0, 1]: 50 instances, almost always dealing with percentages lower is 0: 44 instances, almost all were clamping an index to list bounds, though a few outliers existed lower is 1: 4 instances, 3 were clamping a 1-based index, the other was some safety code to ensure a computed wait time falls within certain bounds to avoid both stalling and spamming both values were constant, but with no real specific values: 11 instances (two of these are kinda 0,1 limits, but a log is being done for volume calculations, so 0 is invalid, but the number is very close to 0) one value was constant, with some arbitrary limit and the other was computed: 0 both values were computed: 20 instances (many instances have the clamping pulled from data, which is generally constant but can be changed easier than code)
Any given call to clamp was put into the first of the categories it matched. "computed" is fairly general, it includes cases where the value is user-input with no actual math done.
As would be expected, all cases were using computed value as the input, only the min/max were ever constant.
The project in this case is a video game's game logic code, written in C#. None of the shaders or engine code is included. There may be additional clamping using min/max combinations, rather than the provided clamp helpers that were not included, however the search did find two instances, where they were commented as being clamps, which were included.
Basically all of the cases will repeat fairly often, either every frame, move, or level. Most are not in loops outside of the frame/game loop.
If lower happens to be greater than upper, that's clearly a mistake. Its better to get an exception immediately, rather than run through a million calculations and only then discover that you've ended up with a million NANs. It's okay if you get a few NANs, that simply indicates that one of your x values was a NAN, or a calculation produced a NAN. But if *every* calculation produces a NAN, well, that's a sign of breakage. Hence, better to raise straight away.
I personally don't have much opinion on NAN behaviour in general - I don't think I've ever actually used them in any of my code, and the few cases they show up, it is due to a bug or corrupted data that I want caught early.
Chris
On Thu, Aug 4, 2016 at 1:21 PM, Chris Kaynor ckaynor@zindagigames.com wrote:
If lower happens to be greater than upper, that's clearly a mistake. Its
better to get an exception immediately, rather than run through a million calculations and only then discover that you've ended up with a million NANs.
sure.
It's okay if you get a few NANs, that simply indicates
that one of your x values was a NAN, or a calculation produced a NAN. But if *every* calculation produces a NAN, well, that's a sign of breakage. Hence, better to raise straight away.
sure -- but one reason NaN exists is so that errors can get propagated through the hardware without bringing everything to a halt -- this is really key in vectorized operations. And it's really useful. So I"d rather not have an exception there, if you are doing something like:
[ clamp(x, y, z) for z in the_max_values]
might be better to check for NaN somewhere else than have that whole operation fail.
I think it would also require more special case checking in the code....
I personally don't have much opinion on NAN behaviour in general - I don't
think I've ever actually used them in any of my code, and the few cases they show up, it is due to a bug or corrupted data that I want caught early.
exactly -- usually a bug or corrupted data -- if NaN is passed in as a limit, it's probably a error of some sort, you really don't want it silently passing your input value through.
And you have inf and -inf if you do want "no limit"
-CHB
On Thu, Aug 04, 2016 at 01:21:27PM -0700, Chris Kaynor wrote:
I was curious about what the likely cases are in many cases, so I ran a quick sample from a professional project I am working on, and found the following results:
[...]
As would be expected, all cases were using computed value as the input, only the min/max were ever constant.
Thanks for doing that Chris. That's what I expected: I can't think of any use-case for clamping a constant value to varying bounds:
x = known_value() for lower, upper in zip(seq1, seq2): y = clamp(x, lower, upper) process(y)
On Thu, Aug 04, 2016 at 04:17:47PM -0700, Chris Barker wrote:
I think it would also require more special case checking in the code....
I think you are over-complicating this, AND ignoring what the IEEE-754 standard says about this.
if NaN is passed in as a limit, it's probably a error of some sort
That's not what the standard says. The standard says NAN as a limit should be treated as "no limit".
And you have inf and -inf if you do want "no limit"
That will still apply. You can also pass 1.7976931348623157E+308, the largest possible float. (If we're talking about float arguments -- for int and Decimal, you can easily exceed that.)
On Thu, Aug 04, 2016 at 09:46:05PM +0200, Victor Stinner wrote:
A suggest to start to write a short PEP as the math.is_close() PEP since there are subtle issues like NaN (float but also Decimal!) and combinations of numerical types (int, float, complex, Decimal, Fraction, numpy scalars like float16, ...).
Maybe a PEP is not needed, I didn't read carefully the thread to check if there is a consensus or not.
No consensus.
I dislike the idea of modifying min() and max() to add special cases for float NaN and decimal NaN.
min() and max() currently return NANs when given a NAN and number, but I don't know if that is deliberate or accidental.
The IEEE-754 standard says that min(x, NAN) and max(x, NAN) should return x.
https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max
Which type do you expect for fmax(int, int)? Should it be int or float?
int.
Should fmax(Decimal, float) raise an error, return a float or return a Decimal?
Comparisons between float and Decimal no longer raise:
py> Decimal(1) < 1.0 False
so I would expect that fmax would return the larger of the two. If the larger is a float, it should return a float. If the larger is a Decimal, it should return a Decimal. If the two values are equal, it's okay to pick an arbitrary one.
On Wed, Aug 03, 2016 at 08:52:24AM -0700, Chris Barker - NOAA Federal wrote:
One could argue that:
clamp(NaN, x,x)
Is clearly defined as x. But that would require special casing
Not so special:
if lower == upper != None: return lower
That's in the spirit of Professor Kahan's admonition that NANs should not be treated as a one way street: most calculations that lead to NANs will, of course, stay as NANs, but there are cases where a calculation on a NAN will lead to a non-NAN:
py> math.hypot(INF, NAN) inf py> math.hypot(NAN, INF) inf py> NAN**0.0 1.0
If the bounds are equal, then clamp(NAN, a, a) should return a.
and, "equality" is a bit of an ephemeral concept with floats, so better to return NaN.
Not really. Please read what Kahan says about NANs. He was one of the committee members that worked out most of these issues in the nineties. He says:
NaN must not be confused with “Undefined.” On the contrary, IEEE 754 defines NaN perfectly well even though most language standards ignore and many compilers deviate from that definition. The deviations usually afflict relational expressions, discussed below. Arithmetic operations upon NaNs other than SNaNs (see below) never signal INVALID, and always produce NaN unless replacing every NaN operand by any finite or infinite real values would produce the same finite or infinite floating-point result independent of the replacements.
That's exactly the situation here: clamp(x, a, a) should return a for any finite or infinite x, which means it should do the same when x is a NAN as well.
https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF
See page 7.
Steven D'Aprano writes:
clamp(x, y, z) --> NaN if z < y
That's a clear error, and it should raise immediately. I see no advantage to returning NAN in this case.
Think about why you're clamping. It's unlikely to be used just once, for a single calculation. You're likely to be clamping a whole series of values, with a fixed lower and upper bounds.
"Likely" isn't a good enough reason:
x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)]
is perfectly plausible code. The "for which case is it easier to write a wrapper" argument applies here, I think.
On Fri, Aug 05, 2016 at 12:09:09PM +0900, Stephen J. Turnbull wrote:
Steven D'Aprano writes:
clamp(x, y, z) --> NaN if z < y
That's a clear error, and it should raise immediately. I see no advantage to returning NAN in this case.
Think about why you're clamping. It's unlikely to be used just once, for a single calculation. You're likely to be clamping a whole series of values, with a fixed lower and upper bounds.
"Likely" isn't a good enough reason:
Of course it is. We write code to prefer known, common use-cases, not just hypothetical "What If" scenarios.
E.g. most trig functions (sin, cos, tan) take angles in radians. I've seen a few take angles in degrees. I've even seen trig functions that take their argument as a multiple of pi (e.g. sinpi(1.25) being equivalent to sin(1.25*pi), only more accurate). All of these have good use-cases.
But I'm willing to bet that you will never, ever find a general purpose programming language or maths library with specialised trig functions that take arguments in 1/37th of a gon. (Yes, "gon" is a real unit.) If you need such a thing, you write it yourself.
Chris has already gone through his code and confirmed what I expected: he uses "clamp" extensively, and the bounds are invariably fixed once at the start of the loop.
But if you find yourself in that unusual situation of needing something unusual, you can easily write your own wrapper:
def clamp(value, lower, upper): if lower > upper: return "Surprise!" return math.clamp(value, lower, upper)
Of course, if math.clamp() returned NAN, you could just as easily go the other way and write a wrapper to raise instead. Neither case is particularly onerous.
But in one case, only a few people will need to wrap the function; in the second case, many people (possibly even *everybody*) will want to wrap the function to avoid the unhelpful standard behaviour. It is our job as function designers to try to cater for the majority, not the minority, when possible.
x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)]
is perfectly plausible code.
I have my doubts. Sure, you can write it, but what would you use it for? What's your use-case?
On Aug 5, 2016 2:13 AM, "Steven D'Aprano" steve@pearwood.info wrote:
x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)]
is perfectly plausible code.
I have my doubts. Sure, you can write it, but what would you use it for? What's your use-case?
Looks like ordinary trend with error bounds to me. I can easily imagine writing that code is clamp is introduced.
And glad to see that Kahan explicitly supports my intuition on NaN not genetically infecting every operation... In fact that clamp(x, nan, nan) is explicitness x according to IEEE-754 2008... Not NaN.
Steven D'Aprano writes:
x = [clamp(x[t], f(t), g(t)) for t in range(1_000_000)]
is perfectly plausible code.
I have my doubts. Sure, you can write it, but what would you use it for? What's your use-case?
Any varying optimal control might also be subject to bounds that vary. These problems arise rather frequently in economic theory. Often the cheapest way to compute them is to compute an unconstrained problem and clamp.
I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process.
On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote:
I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process.
Sounds interesting. Is there a link to somewhere I could learn more about this?
One could argue that:
clamp(NaN, x,x)
Is clearly defined as x. But that would require special casing
Not so special:
if lower == upper != None: return lower
I had the impression earlier that you didn't want a whole pile of special cases, even if each was simple. But sure, this is a nice one to get "right".
on a NAN will lead to a non-NAN:
Yes, if you'd get the same non-nan result for any floating point value where the NaN is, then that makes sense - if NaN means "we have no idea what value this is", but you get the same result regardless, then fine.
But that does Not apply to:
clamp(x, y, Nan)
Or min(x NaN)
However, as wrong as I might think it is ( ;-) ) -- it seems the IEEE has decided that:
min(x, NaN) should truth x (and max).
So we should be consistent.
Arithmetic operations upon NaNs ... never signal INVALID, and always produce NaN unless replacing every NaN operand by any finite or infinite real values would produce the same finite or infinite floating-point result independent of the replacements.
Which is the case for:
clamp(NaN, x,x)
But is not for;
clamp(x, NaN, NaN)
But a standard is a standard :-(
-CHB
Steven D'Aprano writes:
On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote:
I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process.
Sounds interesting. Is there a link to somewhere I could learn more about this?
The textbook I use is Nancy Stokey, The Economics of Inaction https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3%82%AB%E3%82%BF%E3%8...
The example I gave is not a textbook example, but is an "obvious" extension of the simplest textbook models.
Is this idea still alive?
Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies.
It's one function, so hopefully won't need a PEP, but if your proposal meets with a lot of resistance, then you could turn it into a PEP then. But getting all this discussion summaries would be good as a first step.
NOTE: I think it's a fine idea, but I've got way to omay other things I'd like to do first -- so I'm not going to push this forward...
-CHB
On Fri, Aug 5, 2016 at 10:24 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Steven D'Aprano writes:
On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote:
I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process.
Sounds interesting. Is there a link to somewhere I could learn more about this?
The textbook I use is Nancy Stokey, The Economics of Inaction https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3% 82%AB%E3%82%BF%E3%82%AB%E3%83%8A&url=search-alias%3Daps& field-keywords=nancy+stokey+economics+inaction
The example I gave is not a textbook example, but is an "obvious" extension of the simplest textbook models. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
FWIW. I first opined that people should just write their own utility function. However, there are enough differences of opinion about the right semantics that I support it being in the standard library now.
Those decisions may or may not be made the way I most prefer, but I think an "official behavior in the edge cases is best to have... I can always implement my own version with different behavior if I need to.
On Aug 9, 2016 3:43 PM, "Chris Barker" chris.barker@noaa.gov wrote:
Is this idea still alive?
Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies.
It's one function, so hopefully won't need a PEP, but if your proposal meets with a lot of resistance, then you could turn it into a PEP then. But getting all this discussion summaries would be good as a first step.
NOTE: I think it's a fine idea, but I've got way to omay other things I'd like to do first -- so I'm not going to push this forward...
-CHB
On Fri, Aug 5, 2016 at 10:24 PM, Stephen J. Turnbull < turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Steven D'Aprano writes:
On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote:
I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process.
Sounds interesting. Is there a link to somewhere I could learn more about this?
The textbook I use is Nancy Stokey, The Economics of Inaction https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3%82% AB%E3%82%BF%E3%82%AB%E3%83%8A&url=search-alias%3Daps&field- keywords=nancy+stokey+economics+inaction
The example I gave is not a textbook example, but is an "obvious" extension of the simplest textbook models. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Aug 09, 2016 at 03:42:12PM -0700, Chris Barker wrote:
Is this idea still alive?
Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies.
I'm happy to write up a summary and a reference implementation.
On 08/09/2016 05:05 PM, Steven D'Aprano wrote:
On Tue, Aug 09, 2016 at 03:42:12PM -0700, Chris Barker wrote:
Is this idea still alive?
Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies.
I'm happy to write up a summary and a reference implementation.
Excellent!
I'm looking forward to it.
-- ~Ethan~
Thank you!
On Tuesday, August 9, 2016 at 8:07:54 PM UTC-4, Steven D'Aprano wrote:
On Tue, Aug 09, 2016 at 03:42:12PM -0700, Chris Barker wrote:
Is this idea still alive?
Despite the bike shedding, I think that some level of consensus may have been reached. So I suggest that either Neil (because it was your idea)
or
Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation
and a
proposal, post it here, and see how it flies.
I'm happy to write up a summary and a reference implementation.
-- Steve _______________________________________________ Python-ideas mailing list Python...@python.org javascript: https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
I just stumble upon on this precise use case yesterday, I solved it unsatisfactorily by the following code (inlined)
value = max(lower, value) value = min(upper, value)
So It's certainly a good thing to have
In short, a PEP is a summary of a long discussion. IMHO a PEP is required to write down the rationale and lists most important alternative and explain why the PEP is better.
The hard part is to write a short but "complete" PEP.
I tried to follow this discussion and I still to understand why my proposition of "def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)" is not good. I expect that a PEP replies to this question without to read the whole thread :-)
I don't recall neither what was the "conclusion" for NaN.
Victor
Le 10 août 2016 00:43, "Chris Barker" chris.barker@noaa.gov a écrit :
Is this idea still alive?
Despite the bike shedding, I think that some level of consensus may have
been reached. So I suggest that either Neil (because it was your idea) or Steven (because you've had a lot of opinions, and done a lot of the homework) or both, of course, put together a reference implementation and a proposal, post it here, and see how it flies.
It's one function, so hopefully won't need a PEP, but if your proposal
meets with a lot of resistance, then you could turn it into a PEP then. But getting all this discussion summaries would be good as a first step.
NOTE: I think it's a fine idea, but I've got way to omay other things I'd
like to do first -- so I'm not going to push this forward...
-CHB
On Fri, Aug 5, 2016 at 10:24 PM, Stephen J. Turnbull <
turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
Steven D'Aprano writes:
On Fri, Aug 05, 2016 at 11:30:35PM +0900, Stephen J. Turnbull wrote:
I can even think of a case where clamp could be used with a constant control and a varying bound: S-s inventory control facing occasional large orders in an otherwise continuous, stationary demand process.
Sounds interesting. Is there a link to somewhere I could learn more about this?
The textbook I use is Nancy Stokey, The Economics of Inaction
https://www.amazon.co.jp/s/ref=nb_sb_noss?__mk_ja_JP=%E3%82%AB%E3%82%BF%E3%8...
The example I gave is not a textbook example, but is an "obvious" extension of the simplest textbook models. _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov
Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner victor.stinner@gmail.com wrote:
I tried to follow this discussion and I still to understand why my proposition of "def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)" is not good. I expect that a PEP replies to this question without to read the whole thread :-) I don't recall neither what was the "conclusion" for NaN.
This was the implementation I suggested (but I borrowed it from StackOverflow, I don't claim originality). There are a couple arguable bugs in that implementation, and several questions I would want answered in a PEP. I'm not going to argue again about the best answer, but we should explicitly answer what the result of the following are (with some justified reasons):
clamp(5, nan, nan) clamp(5, 0, nan) clamp(5, nan, 10) clamp(nan, 0, 10) clamp(nan, 5, 5)
Also, min and max take the "first thing not greater/less than the rest". Arguably that's not what we would want for clamp(). But maybe it is, explain the reasons. E.g.:
max(1, nan)
1
max(nan, 1)
nan
max(1.0, 1)
1.0
max(1, 1.0)
1
This has the obvious implications for the semantics of clamp() if it is based on min()/max() in the manner proposed.
Also, what is the calling syntax? Are the arguments strictly positional, or do they have keywords? What are those default values if the arguments are not specified for either or both of min_val/max_val? E.g., is this OK:
clamp(5, min_val=0)
If this is allowable to mean "unbounded on the top" then the simple implementation will break using the most obvious default values:
clamp('foo', min_val='aaa') # expect "lexical clamping"
min(max("aaa", "foo"), float('inf'))
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: float() < str()
min(max("aaa", "foo"), None)
Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: NoneType() < str()
min(max("aaa", "foo"), "zzz")
'foo'
Quite possibly this is exactly the behavior we want, but I'd like an explanation for why.
On Sat, Aug 13, 2016 at 01:25:58AM +0200, Victor Stinner wrote:
I tried to follow this discussion and I still to understand why my proposition of "def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)" is not good. I expect that a PEP replies to this question without to read the whole thread :-)
I said I would write up a summary, and I will. If you want to call it a PEP, I'm okay with that. I won't forget your proposal either :-)
On 2016-08-13 00:48, David Mertz wrote:
On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner <victor.stinner@gmail.com mailto:victor.stinner@gmail.com> wrote:
[snip]
Also, what is the calling syntax? Are the arguments strictly positional, or do they have keywords? What are those default values if the arguments are not specified for either or both of min_val/max_val? E.g., is this OK:
clamp(5, min_val=0)
I would've thought that the obvious default would be None, meaning "missing".
On Sat, Aug 13, 2016 at 1:31 PM, MRAB python@mrabarnett.plus.com wrote:
On 2016-08-13 00:48, David Mertz wrote:
On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner <victor.stinner@gmail.com mailto:victor.stinner@gmail.com> wrote:
[snip]
Also, what is the calling syntax? Are the arguments strictly positional, or do they have keywords? What are those default values if the arguments are not specified for either or both of min_val/max_val? E.g., is this OK:
clamp(5, min_val=0)
I would've thought that the obvious default would be None, meaning "missing".
Doesn't really matter what the defaults are. That call means "clamp with a minimum of 0 and no maximum". It's been completely omitted.
But yes, probably it would be min_val=None, max_val=None.
ChrisA
None seems reasonable. But it does require some conditional checks rather than the simplest min-of-max. Not a bad answer, just something to be explicit about.
On Aug 12, 2016 8:44 PM, "Chris Angelico" rosuav@gmail.com wrote:
On Sat, Aug 13, 2016 at 1:31 PM, MRAB python@mrabarnett.plus.com wrote:
On 2016-08-13 00:48, David Mertz wrote:
On Fri, Aug 12, 2016 at 4:25 PM, Victor Stinner <victor.stinner@gmail.com mailto:victor.stinner@gmail.com> wrote:
[snip]
Also, what is the calling syntax? Are the arguments strictly positional, or do they have keywords? What are those default values if the arguments are not specified for either or both of min_val/max_val? E.g., is this OK:
clamp(5, min_val=0)
I would've thought that the obvious default would be None, meaning "missing".
Doesn't really matter what the defaults are. That call means "clamp with a minimum of 0 and no maximum". It's been completely omitted.
But yes, probably it would be min_val=None, max_val=None.
ChrisA _______________________________________________ Python-ideas mailing list Python-ideas@python.org https://mail.python.org/mailman/listinfo/python-ideas Code of Conduct: http://python.org/psf/codeofconduct/
Far be it from me to damp your enthusiasm, but it seems to me that the functionality of a clamp function is so simple, and yet has so many possible variations, that it's not worth providing it.
I.e., it's probably quicker for someone to write their own version (typically <= 4 lines of code) than to
look up the library version
read *and understand* its specification
decide whether it's suitable for their use case
maybe: decide that it isn't (or they don't understand it) and write their own one anyway.
A custom version can also be optimised for the particular use case.
Regards
Rob Cliffe
On 13/08/2016 01:14, Steven D'Aprano wrote:
On Sat, Aug 13, 2016 at 01:25:58AM +0200, Victor Stinner wrote:
I tried to follow this discussion and I still to understand why my proposition of "def clamp(min_val, value, max_val): return min(max(min_val, value), max_val)" is not good. I expect that a PEP replies to this question without to read the whole thread :-)
I said I would write up a summary, and I will. If you want to call it a PEP, I'm okay with that. I won't forget your proposal either :-)