![](https://secure.gravatar.com/avatar/5615a372d9866f203a22b2c437527bbb.jpg?s=120&d=mm&r=g)
On Wed, Aug 03, 2016 at 11:23:06AM +1200, Greg Ewing wrote:
David Mertz wrote:
It really doesn't make sense to me that a clamp() function would *limit to* a NaN.
That's what I thought too, at first, but on reading more about the IEEE-754 standard, I've changed my mind. Passing a NAN as bounds can be interpreter as "bounds is missing", i.e. "no bounds".
Keep in mind that the NaNs involved have probably arisen from some other computation that went wrong, and that the purpose of the whole NaN system is to propagate an indication of that wrongness so that it's evident in the final result.
That's not quite right. NANs are allowed to "disappear". In fact, Professor Kahan has specifically written that NANs which cannot diappear out of a calculation are useless: Were there no way to get rid of NaNs, they would be as useless as Indefinites on CRAYs; as soon as one were encountered, computation would be best stopped rather than continued for an indefinite time to an Indefinite conclusion. That is why some operations upon NaNs must deliver non-NaN results. Which operations? Page 8, https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF He describes some of the conditions under which a NAN might drop out of a calculation. He also says that min(NAN, x) and max(NAN, x) should both return x, which implies that so should clamp(x, NAN, NAN).
So here's how I see it:
clamp(NaN, y, z) is asking "Is an unknown number between y and z?" The answer to that is not known, so the result should be NaN.
I agree, and fortunately that's easily performed without any explicit test for NAN-ness. Given x = float('nan'), neither x < lower nor x > upper will ever be true, no matter what the lower and upper bounds are. So we'll fall through to the default and return x, which is a NAN, as wanted.
clamp(x, y, NaN) is asking "Is x between y and an unknown number?" If x > y, the answer to that is not known, so the result should be NaN.
No, that's not necessarily right. That's one possible interpretation of setting a bounds to NAN. I've seen that referred to as "NAN poisoning", and it is a reasonable thing to ask for. But... ...another interpretion, and one which is closer to the current revision of the IEEE-754 standard, is that clamp(x, NAN, NAN) should treat the NANs as "missing values", i.e. that there is no lower or upper bound. That would be equivalent to specifying infinities as bounds. If you want a NAN-poisoning version of clamp(), it is easy to build it from a NAN-as-missing-value clamp(). If you start with NAN-poisoning, you can't easily get NANs-as-missing-values. So if we get only one, we should treat NANs as missing values, and let people build the NAN-poisoning version as a wrapper.
If x < y, you might argue that the result should be y. But consider clamp(x, 2, 1). You're asking it to limit x to a value not less than 2 and not greater than 1. There's no such number, so arguably the result should be NaN.
In that case, I would raise ValueError.
So in summary, I think it should be:
clamp(NaN, y, z) --> NaN
Agreed. It couldn't reasonably be anything else.
clamp(x, NaN, z) --> NaN clamp(x, y, NaN) --> NaN
No, both these cases should treat NAN as equivalent to no limit, and clamp x as appropriate. If you want a second, NAN-poisoning clamp(), that's your perogative, but don't force it upon everyone.
clamp(x, y, z) --> NaN if z < y
That's a clear error, and it should raise immediately. I see no advantage to returning NAN in this case. Think about why you're clamping. It's unlikely to be used just once, for a single calculation. You're likely to be clamping a whole series of values, with a fixed lower and upper bounds. The bounds are unlikely to be known at compile-time, but they aren't going to change from clamping to clamping. Something like this: lower, upper = get_bounds() for x in values(): y = some_calculation(x) y = clamp(y, lower, upper) do_something_with(y) is the most likely use-case, I think. If lower happens to be greater than upper, that's clearly a mistake. Its better to get an exception immediately, rather than run through a million calculations and only then discover that you've ended up with a million NANs. It's okay if you get a few NANs, that simply indicates that one of your x values was a NAN, or a calculation produced a NAN. But if *every* calculation produces a NAN, well, that's a sign of breakage. Hence, better to raise straight away. -- Steve