[Python-ideas] Consider adding clip or clamp function to math

Thu Aug 4 08:35:58 EDT 2016

On Tue, Aug 02, 2016 at 04:35:55PM -0700, Chris Barker wrote:

> If someone is passing a NaN in for a bound, then they are passing in
> garbage, essentially -- "I have no idea what my bounds are" so garbage is
> what they should get back -- "I have no idea what your clamped values are".

The IEEE 754 standard tells us what min(x, NAN) and max(x, NAN) should 
be: in both cases it is x.

https://en.wikipedia.org/wiki/IEEE_754_revision#min_and_max

Quote:
    In order to support operations such as windowing in which a NaN 
    input should be quietly replaced with one of the end points, min 
    and max are defined to select a number, x, in preference to a 
    quiet NaN:

        min(x,NaN) = min(NaN,x) = x
        max(x,NaN) = max(NaN,x) = x

According to Wikipedia, this behaviour was chosen specifically for 
the use-case we are discussing: windowing or clamping.

See also page 9 of Professor William Kahan's notes here:

https://people.eecs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF

Quote:

    For instance max{x, y} should deliver the same result as max{y, x} but
    almost no implementations do that when x is NaN. There are good 
    reasons to define max{NaN, 5} := max{5, NaN} := 5 though many would
    disagree.

It's okay to disagree and want "NAN poisoning" behaviour. If we define 
clamp(x, NAN, NAN) as x, as I have been arguing, then you can *easily* 
get the behaviour you want with a simple wrapper:

def clamp(x, lower, upper):
    if math.isnan(lower) or math.isnan(upper):
        # raise or return NAN
    else:
        return math.clamp(x, lower, upper)

Apart from the cost of one extra function call, which isn't too bad, 
this is no more expensive than what you are suggesting *everyone* should 
pay (two calls to math.isnan). So you are no worse off under my 
proposal: just define your own helper function, and you get the 
behaviour you want. We all win.

But if the standard clamp() function has the behaviour you want, 
violating IEEE-754, then you are forcing it on *everyone*, whether they 
want it or not. I don't want it, and I cannot use it. There's nothing I 
can do except re-implement clamp() from scratch and ignore the one in 
the math library.

As you propose it, clamp() is no use to me: it unnecesarily converts the 
bounds to float, which may raise an exception. If I use it in a loop, it 
unnecessarily checks to see if the bounds are NANs, over and over and 
over again, even when I know that they aren't. It does the wrong thing 
(according to my needs, according to Professor Kahan, and according to 
the current revision of IEEE-754) if I do happen to pass a NAN as bounds.

Numpy has a "nanmin" which ignores NANs (as specified by IEEE-754), and 
"amin" which propogates NANs:

http://docs.scipy.org/doc/numpy/reference/generated/numpy.nanmin.html
http://docs.scipy.org/doc/numpy/reference/generated/numpy.amin.html

Similar for "minimum" and "fmin", which return the element-wise 
minimums.

By the way, there are also POSIX functions fmin and fmax which behave 
according to the standard:

http://man7.org/linux/man-pages/man3/fmin.3.html
http://man7.org/linux/man-pages/man3/fmax.3.html

Julia has a clamp() function, although unfortunately the documentation 
doesn't say what the behaviour with NANs is:

http://julia.readthedocs.io/en/latest/stdlib/math/#Base.clamp

-- 
Steve