float("nan") in set or as key
Steven D'Aprano
steve+comp.lang.python at pearwood.info
Thu Jun 2 05:54:30 EDT 2011
On Wed, 01 Jun 2011 21:41:06 +0100, Nobody wrote:
> On Sun, 29 May 2011 23:31:19 +0000, Steven D'Aprano wrote:
>
>>> That's overstating it. There's a good argument to be made for raising
>>> an exception.
>>
>> If so, I've never heard it, and I cannot imagine what such a good
>> argument would be. Please give it.
>
> Exceptions allow you to write more natural code by ignoring the awkward
> cases. E.g. writing "x * y + z" rather than first determining whether "x
> * y" is even defined then using a conditional.
You've quoted me out of context. I wasn't asking for justification for
exceptions in general. There's no doubt that they're useful. We were
specifically talking about NAN == NAN raising an exception rather than
returning False.
>>> Bear in mind that an exception is not necessarily an error, just an
>>> "exceptional" condition.
>>
>> True, but what's your point? Testing two floats for equality is not an
>> exceptional condition.
>
> NaN itself is an exceptional condition which arises when a result is
> undefined or not representable. When an operation normally returns a
> number but a specific case cannot do so, it returns not-a-number.
I'm not sure what "not representable" is supposed to mean, but if you
"undefined" you mean "invalid", then correct.
> The usual semantics for NaNs are practically identical to those for
> exceptions. If any intermediate result in a floating-point expression is
> NaN, the overall result is NaN.
Not necessarily. William Kahan gives an example where passing a NAN to
hypot can justifiably return INF instead of NAN. While it's certainly
true that *mostly* any intermediate NAN results in a NAN, that's not a
guarantee or requirement of the standard. A function is allowed to
convert NANs back to non-NANs, if it is appropriate for that function.
Another example is the Kronecker delta:
def kronecker(x, y):
if x == y: return 1
return 0
This will correctly consume NAN arguments. If either x or y is a NAN, it
will return 0.
(As an aside, this demonstrates that having NAN != any NAN, including
itself, is useful, as kronecker(x, x) will return 0 if x is a NAN.)
> Similarly, if any intermediate
> calculation throws an exception, the calculation as a whole throws an
> exception.
This is certainly true... the exception cannot look into the future and
see that it isn't needed because a later calculation cancels it out.
Exceptions, or hardware traps, stop the calculation. NANs allow the
calculation to proceed. Both behaviours are useful, and the standard
allows for both.
> If x is NaN, then "x + y" is NaN, "x * y" is NaN, pretty much anything
> involving x is NaN. By this reasoning both "x == y" and "x != y" should
> also be NaN.
NAN is a sentinel for an invalid operation. NAN + NAN returns a NAN
because it is an invalid operation, not because NANs are magical goop
that spoil everything they touch.
For example, print(NAN) does not return a NAN or raise an exception, nor
is there any need for it to. Slightly more esoteric: the signbit and
copysign functions both accept NANs without necessarily returning NANs.
Equality comparison is another such function. There's no need for
NAN == NAN to fail, because the equality operation is perfectly well
defined for NANs.
> But only the floating-point types have a NaN value, while
> bool doesn't. However, all types have exceptions.
What relevance does bool have?
>>>> The correct answer to "nan == nan" is False, they are not equal.
>>>
>>> There is no correct answer to "nan == nan".
>>
>> Why on earth not?
>
> Why should there be a correct answer? What does NaN actually mean?
NAN means "this is a sentinel marking that an invalid calculation was
attempted". For the purposes of numeric calculation, it is often useful
to allow those sentinels to propagate through your calculation rather
than to halt the program, perhaps because you hope to find that the
invalid marker ends up not being needed and can be ignored, or because
you can't afford to halt the program.
Does INVALID == INVALID? There's no reason to think that the question
itself is an invalid operation. If you can cope with the question "Is an
apple equal to a puppy dog?" without shouting "CANNOT COMPUTE!!!" and
running down the street, there's no reason to treat NAN == NAN as
anything worse.
So what should NAN == NAN equal? Consider the answer to the apple and
puppy dog comparison. Chances are that anyone asked that will give you a
strange look and say "Of course not, you idiot". (In my experience, and
believe it or not I have actually tried this, some people will ask you to
define equality. But they're a distinct minority.)
If you consider "equal to" to mean "the same as", then the answer is
clear and obvious: apples do not equal puppies, and any INVALID sentinel
is not equal to any other INVALID. (Remember, NAN is not a value itself,
it's a sentinel representing the fact that you don't have a valid number.)
So NAN == NAN should return False, just like the standard states, and
NAN != NAN should return True. "No, of course not, they're not equal."
> Apart from anything else, defining "NaN == NaN" as False means that "x
> == x" is False if x is NaN, which violates one of the fundamental axioms
> of an equivalence relation (and, in every other regard, "==" is normally
> intended to be an equivalence relation).
Yes, that's a consequence of NAN behaviour. I can live with that.
> The creation of NaN was a pragmatic decision on how to handle
> exceptional conditions in hardware. It is not holy writ, and there's no
> fundamental reason why a high-level language should export the
> hardware's behaviour verbatim.
There is a good, solid reason: it's a *useful* standard that *works*,
proven in practice, invented by people who have forgotten more about
floating point than you or I will ever learn, and we dismiss their
conclusions at our peril.
A less good reason: its a standard. Better to stick to a not-very-good
standard than to have the Wild West, where everyone chooses their own
behaviour. You have NAN == NAN raise ValueError, Fred has it return True,
George has it return False, Susan has it return a NAN, Michelle makes it
raise MathError, somebody else returns Maybe ...
But IEEE-754 is not just a "not-very-good" standard. It is an extremely
good standard.
>>> Arguably, "nan != nan" should also be false, but that would violate
>>> the invariant "(x != y) == !(x == y)".
>>
>> I cannot imagine what that argument would be. Please explain.
>
> A result of NaN means that the result of the calculation is undefined,
> so the value is "unknown".
Incorrect. NANs are not "unknowns", or missing values.
--
Steven
More information about the Python-list
mailing list