[Python-Dev] Why is nan != nan?
casey at pandora.com
Fri Mar 26 18:40:43 CET 2010
On Mar 25, 2010, at 7:19 PM, P.J. Eby wrote:
> At 11:57 AM 3/26/2010 +1100, Steven D'Aprano wrote:
>> But they're not -- they're *signals* for "your calculation has gone screwy and the result you get is garbage", so to speak. You shouldn't even think of a specific NAN as a piece of specific garbage, but merely a label on the *kind* of garbage you've got (the payload): INF-INF is, in some sense, a different kind of error to log(-1). In the same way you might say "INF-INF could be any number at all, therefore we return NAN", you might say "since INF-INF could be anything, there's no reason to think that INF-INF == INF-INF."
> So, are you suggesting that maybe the Pythonic thing to do in that case would be to cause any operation on a NAN (including perhaps comparison) to fail, rather than allowing garbage to silently propagate?
> In other words, if NAN is only a signal that you have garbage, is there really any reason to keep it as an *object*, instead of simply raising an exception? Then, you could at least identify what calculation created the garbage, instead of it percolating up through other calculations.
> In low-level languages like C or Fortran, it obviously makes sense to represent NAN as a value, because there's no other way to represent it. But in a language with exceptions, is there a use case for it existing as a value?
If a NaN object is allowed to exist, that is a float operation that does not return a real number does not itself raise an exception immediately, then it will always be possible to get (seemingly) nonsensical behavior when it is used in containers that do not themselves "operate" on their elements.
So even provided that performing any "operation" on a NaN object raises an exception, it would still be possible to add such an object to a list or tuple and have subsequent containment checks for that object return false. So this "solution" would simply narrow the problem posed, but not eliminate it.
None of the solution posed seem very ideal, in particular when they deviate from the standard in arbitrary ways someone deems "better". It's obvious to me that no ideal solution exists so long as you attempt to represent non-numeric values in a numeric type. So unless you simply eliminate NaNs (thus breaking the standard), you are going to confuse somebody. And I think having float deviate from the IEEE standard is ill advised unless there is no alternative (i.e., the standard cannot be practically implemented), and breaking it will confuse people too (and probably the ones that know this domain).
I propose that the current behavior stands as is and that the documentation make mention of the fact that NaN values are unordered, thus some float values may not behave intuitively wrt hashing, equality, etc.
The fact of the matter is that using floats as dict keys or set values or even just checking equality is much more complex in practice than you would expect. I mean even representing 1.1 is problematic ;^). Unless the float values you are using are constants, how would you practically use them as dict keys, or hsah set members anyway? I'm not saying it can't be done, but is a hash table with float keys ever a data structure that someone on this list would recommend? If so good luck and god speed 8^)
More information about the Python-Dev