float("nan") in set or as key
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Fri Jun 3 00:23:10 EDT 2011
On Fri, 03 Jun 2011 11:17:17 +1200, Gregory Ewing wrote:
> Steven D'Aprano wrote:
>
>> def kronecker(x, y):
>> if x == y: return 1
>> return 0
>>
>> This will correctly consume NAN arguments. If either x or y is a NAN,
>> it will return 0.
>
> I'm far from convinced that this result is "correct". For one thing, the
> Kronecker delta is defined on integers, not reals, so expecting it to
> deal with NaNs at all is nonsensical.
Fair point. Call it an extension of the Kronecker Delta to the reals then.
> For another, this function as
> written is numerically suspect, since it relies on comparing floats for
> exact equality.
Well, it is a throw away function demonstrating a principle, not battle-
hardened production code.
But it's hard to say exactly what alternative there is, if you're going
to accept floats. Should you compare them using an absolute error? If so,
you're going to run into trouble if your floats get large. It is very
amusing when people feel all virtuous for avoiding equality and then
inadvertently do something like this:
y = 2.1e12
if abs(x - y) <= 1e-9:
# x is equal to y, within exact tolerance
...
Apart from being slower and harder to read, how is this different from
the simpler, more readable x == y?
What about a relative error? Then you'll get into trouble when the floats
are very small. And how much error should you accept? What's good for
your application may not be good for mine.
Even if you define your equality function to accept some limited error
measured in Units in Last Place (ULP), "equal to within 2 ULP" (or any
other fixed tolerance) is no better, or safer, than exact equality, and
very likely worse.
In practice, either the function needs some sort of "how to decide
equality" parameter, so the caller can decide what counts as equal in
their application, or you use exact floating point equality and leave it
up to the caller to make sure the arguments are correctly rounded so that
values which should compare equal do compare equal.
> But the most serious problem is, given that
>
>> NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
>> because it is an invalid operation,
>
> if kronecker(NaN, x) or kronecker(x, Nan) returns anything other than
> NaN or some other sentinel value, then you've *lost* the information
> that an invalid operation occurred somewhere earlier in the computation.
If that's the most serious problem, then I'm laughing, because of course
I haven't lost anything.
x = result_of_some_computation(a, b, c) # may return NAN
y = kronecker(x, 42)
How have I lost anything? I still have the result of the computation in
x. If I throw that value away, it is because I no longer need it. If I do
need it, it is right there, where it always was.
You seem to have fallen for the myth that NANs, once they appear, may
never disappear. This is a common, but erroneous, misapprehension, e.g.:
"NaN is like a trap door that once you have fallen in you cannot
come back out. Otherwise, the possibility exists that a calculation
will have gone off course undetectably."
http://www.rhinocerus.net/forum/lang-fortran/94839-fortran-ieee-754-
maxval-inf-nan-2.html#post530923
Certainly if you, the function writer, has any reasonable doubt about the
validity of a NAN input, you should return a NAN. But that doesn't mean
that NANs are "trap doors". It is fine for them to disappear *if they
don't matter* to the final result of the calculation. I quote:
"The key result of these rules is that once you get a NaN during
a computation, the NaN has a STRONG TENDENCY [emphasis added] to
propagate itself throughout the rest of the computation..."
http://www.savrola.com/resources/NaN.html
Another couple of good examples:
- from William Kahan, and the C99 standard: hypot(INF, x) is always INF
regardless of the value of x, hence hypot(INF, NAN) returns INF.
- since pow(x, 0) is always 1 regardless of the value of x, pow(NAN, 0)
is also 1.
In the case of the real-valued Kronecker delta, I argue that the NAN
doesn't matter, and it is reasonable to allow it to disappear.
Another standard example where NANs get thrown away is the max and min
functions. The latest revision of IEEE-754 (2008) allows for max and min
to ignore NANs.
> You can't get a valid result from data produced by an invalid
> computation. Garbage in, garbage out.
Of course you can. Here's a trivial example:
def f(x):
return 1
It doesn't matter what value x takes, the result of f(x) should be 1.
What advantage is there in having f(NAN) return NAN?
>> not because NANs are magical goop that spoil everything they touch.
>
> But that's exactly how the *have* to behave if they truly indicate an
> invalid operation.
>
> SQL has been mentioned in relation to all this. It's worth noting that
> the result of comparing something to NULL in SQL is *not* true or false
> -- it's NULL!
I'm sure they have their reasons for that. Whether they are good reasons
or not, I don't know. I do know that the 1999 SQL standard defined *four*
results for boolean comparisons, true/false/unknown/null, but allowed
implementations to treat unknown and null as the same.
--
Steven
More information about the Python-list
mailing list