[Python-Dev] == on object tests identity in 3.x

Tue Jul 8 18:57:45 CEST 2014

On Tue, Jul 08, 2014 at 04:53:50PM +0900, Stephen J. Turnbull wrote:
> Chris Angelico writes:
> 
>  > The reason NaN isn't equal to itself is because there are X bit
>  > patterns representing NaN, but an infinite number of possible
>  > non-numbers that could result from a calculation.
> 
> I understand that.  But you're missing at least two alternatives that
> involve raising on some calculations involving NaN, as well as the
> fact that forcing inequality of two NaNs produced by equivalent
> calculations is arguably just as wrong as allowing equality of two
> NaNs produced by the different calculations.  

I don't think so. Floating point == represents *numeric* equality, not
(for example) equality in the sense of "All Men Are Created Equal". Not
even numeric equality in the most general sense, but specifically in the
sense of (approximately) real-valued numbers, so it's an extremely 
precise definition of "equal", not fuzzy in any way.

In an early post, you suggested that NANs don't have a value, or that 
they have a value which is not a value. I don't think that's a good way 
to look at it. I think the obvious way to think of it is that NAN's 
value is Not A Number, exactly like it says on the box. Now, if 
something is not a number, obviously you cannot compare it numerically:

    "Considered as numbers, is the sound of rain on a tin roof
     numerically equal to the sight of a baby smiling?"

Some might argue that the only valid answer to this question is "Mu",

https://en.wikipedia.org/wiki/Mu_%28negative%29#.22Unasking.22_the_question

but if we're forced to give a Yes/No True/False answer, then clearly
False is the only sensible answer. No, Virginia, Santa Claus is not the 
same number as Santa Claus.

To put it another way, if x is not a number, then x != y for all 
possible values of y -- including x.

[Disclaimer: despite the name, IEEE-754 arguably does not intend NANs to 
be Not A Number in the sense that Santa Claus is not a number, but more 
like "it's some number, but it's impossible to tell which". However, 
despite that, the standard specifies behaviour which is best thought of 
in terms of as the Santa Claus model.]

> That's where things get
> fuzzy for me -- in Python I would expect that preserving invariants
> would be more important than computational efficiency, but evidently
> it's not.  

I'm not sure what you're referring to here. Is it that containers such 
as lists and dicts are permitted to optimize equality tests with 
identity tests for speed?

py> NAN = float('NAN')
py> a = [1, 2, NAN, 4]
py> NAN in a  # identity is checked before equality
True
py> any(x == NAN for x in a)
False

When this came up for discussion last time, the clear consensus was that 
this is reasonable behaviour. NANs and other such "weird" objects are 
too rare and too specialised for built-in classes to carry the burden of 
having to allow for them. If you want a "NAN-aware list", you can make 
one yourself.

> I assume that I would have a better grasp on why Python
> chose to go this way rather than that if I understood IEEE 754 better.

See the answer by Stephen Canon here:

http://stackoverflow.com/questions/1565164/

[quote]

It is not possible to specify a fixed-size arithmetic type that 
satisfies all of the properties of real arithmetic that we know and 
love. The 754 committee has to decide to bend or break some of them. 
This is guided by some pretty simple principles:

    When we can, we match the behavior of real arithmetic.
    When we can't, we try to make the violations as predictable and as 
    easy to diagnose as possible.

[end quote]

In particular, reflexivity for NANs was dropped for a number of reasons, 
some stronger than others:

- One of the weaker reasons for NAN non-reflexivity is that it preserved
  the identity x == y <=> x - y == 0. Although that is the cornerstone 
  of real arithmetic, it's violated by IEEE-754 INFs, so violating it
  for NANs is not a big deal either.

- Dropping reflexivity preserves the useful property that NANs compare 
  unequal to everything.

- Practicality beats purity: dropping reflexivity allowed programmers
  to identify NANs without waiting years or decades for programming 
  languages to implement isnan() functions. E.g. before Python had 
  math.isnan(), I made my own:

  def isnan(x):
      return isinstance(x, float) and x != x

- Keeping reflexivity for NANs would have implied some pretty nasty
  things, e.g. if log(-3) == log(-5), then -3 == -5.

Basically, and I realise that many people disagree with their decision 
(notably Bertrand Meyer of Eiffel fame, and our own Mark Dickenson), the 
IEEE-754 committee led by William Kahan decided that the problems caused 
by having NANs compare unequal to themselves were much less than the 
problems that would have been caused without it.

-- 
Steven