checking for identity before comparing built-in objects

It seems that built-in classes do not short-circuit `__eq__` method when the objects are identical, at least in CPython: f = frozenset(range(200000000)) f1 = f f1 == f # this operation will take about 1 sec on my machine Is there any disadvantage to checking whether the equality was called with the same object, and if it was, return `True` right away? I noticed this when trying to memoize a function that has large frozenset arguments. While hashing of a large argument is very fast after it's done once (hash value is presumably cached), the equality comparison is always slow even against itself. So when the same large argument is provided over and over, memoization is slow. Of course, there's a workaround: subclass frozenset, and redefine __eq__ to check id() first. And arguably, for this particular use case, I should redefine both __hash__ and __eq__, to make them only look exclusively at id(), since it's not worth wasting memoizer time trying to compare two non-identical large arguments that are highly unlikely to compare equal anyway. So if there's any reason for the current implementation, I don't have a strong argument against it.

On 04/10/12 21:48, Max Moroz wrote:
You shouldn't over-generalize. Some built-ins do short-circuit __eq__ when the objects are identical. I believe that strings and ints both do. Other types might not.
Is there any disadvantage to checking whether the equality was called with the same object, and if it was, return `True` right away?
That would break floats and Decimals, both of which support NANs. The decision whether or not to optimize __eq__ should be left up to the type. Some types, for example, might decide to optimize x == x even if x contains a NAN or other objects that break reflexivity of equality. Other types might prefer not to. (Please do not start an argument about NANs and reflexivity. That's been argued to death, and there are very good reasons for the IEEE 754 standard to define NANs the way they do.) Since frozensets containing NANs are rare (I presume), I think it is reasonable to optimize frozenset equality. But I do not think it is reasonable for Python to mandate identity checking before __eq__.
I'm not sure what you are doing here, because dicts (at least in Python 3.2) already short-circuit equality: py> NAN = float('nan') py> NAN == NAN False py> d = {NAN: 42} py> d[NAN] 42 Actually, that behaviour goes back to at least 2.4, so I'm not sure how you are doing memoization and not seeing the same optimization. -- Steven

On 2012-10-04 15:07, Mike Graham wrote:
Think of it this way: Calculation A returns NaN for some reason Calculation B also returns NaN for some reason Have they really returned the same result? Just because they're both NaN doesn't mean that they're the _same_ NaN...

2012/10/4 Steven D'Aprano <steve@pearwood.info>:
This optimization is not implemented for Unicode strings. PyObject_RichCompareBool() implements this optimization which leads to incorrect results: nan = float("nan") mytuple = (nan,) assert mytuple != mytuple # fails I think that the optimization should be implemented for Unicode strings, but disabled in PyObject_RichCompareBool(). @Max Moroz: Can you please open an issue on bugs.python.org? Victor

On 05/10/12 01:08, Victor Stinner wrote:
That does not match my experience. In Python 3.2, I generate a large unicode string, and an equal but not identical copy: s = "aЖcdef"*100000 t = "a" + s[1:] assert s is not t and s == t Using timeit, s == s is about 10000 times faster than s == t. -- Steven

On 4 October 2012 17:05, MRAB <python@mrabarnett.plus.com> wrote:
This was discussed not long ago in a different thread. Here is the line: http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/unicodeobject.c#l1050... As I understood it that line is the reason that comparisons for interned strings are faster. Oscar

On Thu, Oct 4, 2012 at 7:19 AM, MRAB <python@mrabarnett.plus.com> wrote:
Someone who performs two calculations with float numbers should never compare their results for equality. It's really a bug to rely on that comparison: # this is a bug # since the result of this comparison for regular numbers is unpredictable # so doesn't it really matter how this behaves when NaNs are compared? if a/b == c/d: # ... On the other hand, comparing a number to another number, when none of the two numbers are involved in a calculation, is perfectly fine: # this is not a bug # too bad that it won't work as expected # when input1 == input2 == 'nan' a = float(input1) b = float(input2) if a == b: # ... So it seems to me your argument is this: "let's break the expectations of developers who are writing valid code, in order to partially meet the expectations of developers who are writing buggy code". If so, I disagree.

On Thu, 4 Oct 2012 17:08:40 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
I think we should wait for someone to complain before disabling it. It's a useful optimization. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 05, 2012 at 01:00:10AM +0200, Antoine Pitrou wrote:
+1 I will go to the wall to defend correct IEEE 754 semantics for NANs, but I also support containers that optimise away those semantics by default. I think it's too early to talk about disabling it without even the report of a bug caused by it. -- Steven

On Thu, Oct 04, 2012 at 05:08:40PM +0200, Victor Stinner wrote:
I think that the optimization should be implemented for Unicode strings, but disabled in PyObject_RichCompareBool().
Actually, this change to PyObject_RichCompareBool() has been made before, but was reverted after the discussion in http://bugs.python.org/issue4296 Cheers, Sven

On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
But it seems like set and frozenset behave like this anyway (using "is" to compare it's items):
frozenset([float("nan")]) == frozenset([float("nan")]) False
So the "is" optimization should not change it's semantics. (I tested this in Python 2.7.3 and 3.2.3)

On Thu, Oct 4, 2012 at 9:53 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Why not? This is python-ideas, isn't it? I've been hearing that IEEE 754 committee had some "very good reasons" to violate reflexivity of equality comparison with NaNs since I first learned about NaNs some 20 years ago. From time to time, I've also heard claims that there are some important numeric algorithms that depend on this behavior. However, I've never been able to dig out the actual rationale that convinced the committee that voted for IEEE 754 or any very good reasons to preserve this behavior in Python. I am not suggesting any language changes, but I think it will be useful to explain why float('nan') != float('nan') somewhere in the docs. A reference to IEEE 754 does not help much. Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.

On Mon, Oct 8, 2012 at 11:35 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
NaN isn't a single value, but a whole category of values. Conceptually, it's an uncountably infinite (I think that's the technical term) of invalid results; in implementation, NaN has the highest possible exponent and any non-zero mantissa. So then the question becomes: Should *all* NaNs be equal, or only ones with the same bit pattern? Aside from signalling vs non-signalling NaNs, I don't think there's any difference between one and another, so they should probably all compare equal. And once you go there, a huge can o'worms is opened involving floating point equality. It's much MUCH easier and simpler to defer to somebody else's standard and just say "NaNs behave according to IEEE 754, blame them if you don't like it". There would possibly be value in guaranteeing reflexivity, but it would increase confusion somewhere else. ChrisA

On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <guido@python.org> wrote:
Seriously, we can't change our position on this topic now without making a lot of people seriously unhappy. IEEE 754 it is.
I did not suggest a change. I wrote: "I am not suggesting any language changes, but I think it will be useful to explain why float('nan') != float('nan') somewhere in the docs." If there is a concise explanation for the choice of IEEE 754 vs. Java, I think we should write it down and put an end to this debate.

On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Referencing Java here is absurd and I still consider this suggestion as a troll. Python is not in any way based on Java. On the other hand referencing IEEE 754 makes all the sense in the world, since every other aspect of Python float is based on IEEE 754 double whenever the underlying platform implements this standard -- and all modern CPUs do. I don't think there's anything else we need to say. -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum <guido@python.org> wrote:
Referencing Java here is absurd and I still consider this suggestion as a troll. Python is not in any way based on Java.
I did not suggest that. Sorry if it came out this way. I am well aware that Python and Java were invented independently and have different roots. (IIRC, Java was born from Oak and Python from ABC and Oak and ABC were both developed in the 1980s.) IEEE 784 precedes both languages and one team decided that equality reflexivity for hashable objects was more important than IEEE 784 compliance while the other decided otherwise. Many Python features (mostly library) are motivated by C. In the 90s, "because C does it this way" was a good explanation for a language feature. Doing things differently from the "C way", on the other hand would deserve an explanation. These days, C is rarely first language that a student learns. Hopefully Python will take this place in not so distant future, but many students graduated in late 90s - early 2000s knowing nothing but Java. As a result, these days it is a valid question to ask about a language feature: "Why does Python do X differently from Java?" Hopefully in most cases the answer is "because Python does it better." In case of nan != nan, I would really like to know a modern reason why Python's way is better. Better compliance with a 20-year old standard does not really qualify.

On Sun, Oct 7, 2012 at 10:33 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
In case of nan != nan, I would really like to know a modern reason why Python's way is better.
To this end, a link to Kahan's "How Java’s Floating-Point Hurts Everyone Everywhere" <http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf> may be appropriate.

On Sun, Oct 7, 2012 at 7:33 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Explaining the differences between Python and Java is a job for educators, not for the language reference. I agree that documenting APIs as "this behaves just like C" does not have the same appeal -- but that turn of phrase was mostly used for system calls anyway, and for those I think that a slightly modified redirection (to the OS man pages) is still completely appropriate.
I am not aware of an update to the standard. Being 20 years old does not make it outdated. Again, there are plenty of reasons (you have to ask the numpy folks), but I don't think it is the job of the Python reference manual to give its motivations. It just needs to explain how things work, and if that can be done best by deferring to an existing standard that's fine. Of course a tutorial should probably mention this behavior, but a tutorial does not have the task of giving you the reason for every language feature either -- most readers of the tutorial don't have the context yet to understand those reasons, many don't care, and whether they like it or not, it's not going to change. You keep getting very close to suggesting to make changes, despite your insistence that you just want to know the reason. But assuming you really just are asking in an obnoxious way for the reason, I recommand that you ask the people who wrote the IEEE 754 standard. I'm sure their explanation (which I recall having read once but can't reproduce here) makes sense for Python too. -- --Guido van Rossum (python.org/~guido)

On 10/8/2012 12:19 PM, Guido van Rossum wrote:
I am not aware of an update to the standard. Being 20 years old does not make it outdated.
Similarly, being hundreds or thousands of years old does not make the equality standard, which includes reflexivity of equality, outdated. The IEEE standard violated that older standard. http://bugs.python.org/issue4296 illustrates some of the problems than come with that violation. But given the compromise made to maintain sane behavior of Python's collection classes, I see little reason to change nan in isolation. I wonder if it would be helpful to make a NaN subclass of floats with its own arithmetic and comparison methods. This would clearly mark a nan as Not a Normal float. Since subclasses rule (at least some) binary operations*, this might also simplify normal float code. But perhaps this was considered and rejected before adding math.isnan in 2.6. (And ditto for infinities.) * in that class_ob op subclass_ob is delegated to subclass.__op__, but I am not sure if this applies only to arithmetic, comparisons, or both. -- Terry Jan Reedy

Terry Reedy writes:
I wonder if it would be helpful to make a NaN subclass of floats with its own arithmetic and comparison methods.
It can't be helpful, unless you go a lot further. Specifically, you'd need to require containers to check every element for NaN-ness. That doesn't seem very practical. In any case, the presentation by Kahan (cited earlier by Alexander himself) demolishes the idea that any sort of attempt to implement DWIM for floats in a programming language can succeed at the present state of the art. The best we can get is DWGM ("do what Guido means", even if what Guido means is "ask the Timbot"<wink/>). Kahan pretty explicitly endorses this approach, by the way. At least in the context of choosing default policy for IEEE 754 Exceptions.

On 10/7/2012 9:51 PM, Guido van Rossum wrote:
I don't understand the reluctance to address a common conceptual speed-bump in the docs. After all, the tutorial has an entire chapter (http://docs.python.org/tutorial/floatingpoint.html) that explains how floats work, even though they work exactly as IEEE 754 says they should. A sentence in section 5.4 (Numeric Types) would help. Something like, "In accordance with the IEEE 754 standard, NaN's are not equal to any value, even another NaN. This is because NaN doesn't represent a particular number, it represents an unknown result, and there is no way to know if one unknown result is equal to another unknown result." --Ned.

On 08/10/2012 03:35, Ned Batchelder wrote:
I understand that the undefined result of a computation is not the same as the undefined result of another computation. (E.g. one might represent positive infinity, another might represent underflow or loss of accuracy.) But I can't help feeling (strongly) that the result of a computation should be equal to itself. In other words, after x = float('nan') y = float('nan') I would expect x != y but x == x After all, how much sense does this make (I got this in a quick test with Python 2.7.3):
Making equality non-reflexive feels utterly wrong to me, partly no doubt because of my mathematical background, partly because of the difficulty in implementing container objects and algorithms and God knows what else when you have to remember that some of the objects they may deal with may not be equal to themselves. In particular the difference between my last two examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to justify except by saying that for historical reasons the designers of lists and the designers of dictionaries made different - but entirely reasonable - assumptions about the equality relation, and (perhaps) whether identity implies equality (how do you explain to a Python learner that it doesn't (pathological code examples aside) ???). Couldn't each NAN when generated contain something that identified it uniquely, so that different NANs would always compare as not equal, but any given NAN would compare equal to itself? Rob Cliffe

On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
If we take this route and try to distinguish NaNs with different payload, I am sure you will want to distinguish between -0.0 and 0.0 as well. The later would violate transitivity in -0.0 == 0 == 0.0. The only sensible thing to do with NaNs is either to treat them all equal (the Eiffel way) or to stick to IEEE default. I don't think NaN behavior in Python is a result of a deliberate decision to implement IEEE 754. If that was the case, why 0.0/0.0 does not produce NaN? Similarly, Python math library does not produce infinities where IEEE 754 compliant library should:
Some other operations behave inconsistently:
2 * 10.**308 inf
but
I think non-reflexivity of nan in Python is an accidental feature. Python's float type was not designed with NaN in mind and until recently, it was relatively difficult to create a nan in pure python. It is also not true that IEEE 754 requires that nan == nan is false. IEEE 754 does not define operator '==' (nor does it define boolean false). Instead, IEEE defines a comparison operation that can have one of four results: >, <, =, or unordered. The standard does require than NaN compares unordered with anything including itself, but it does not follow that a language that defines an == operator with boolean results must define it so that nan == nan is false.

On Sun, Oct 7, 2012 at 8:46 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Oh, it was. It was very deliberate. Like in many other areas of Python, I refused to invent new rules when there was existing behavior elsewhere that I could borrow and with which I had no reason to quibble. (And in the case of floating point behavior, there really is no alternate authority to choose from besides IEEE 754. Languages that disagree with it do not make an authority.) Even if I *did* have reasons to quibble with the NaN behavior (there were no NaNs on the mainframe where I learned programming, so they were as new and weird to me as they are to today's novices), Tim Peters, who has implemented numerical libraries for Fortran compilers in a past life and is an absolute authority on floating points, convinced me to follow IEEE 754 as closely as I could.
If that was the case, why 0.0/0.0 does not produce NaN?
Easy. It was an earlier behavior, from the days where IEEE 754 hardware did not yet rule the world, and Python didn't have much op an opinion on float behavior at all -- it just did whatever the platform did. Infinities and NaNs were not on my radar (I hadn't met Tim yet :-). However division by zero (which is not just a float but also an int behavior) was something that I just had to address, so I made the runtime check for it and raise an exception. When we became more formal about this, we considered changing this but decided that the ZeroDivisionError was more user-friendly than silently propagating NaNs everywhere, given the typical use of Python. (I suppose we could make it optional, and IIRC that's what Decimal does -- but for floats we don't have a well-developed numerical context concept yet.)
Again, this mostly comes from backward compatibility with the math module's origins (and it is as old as Python itself, again predating its use of IEEE 754). AFAIK Tim went over the math library very carefully and cleaned up what he could, so he probably thought about this as well. Also, IIUC the IEEE library prescribes exceptions as well as return values; e.g. "man 3 log" on my OSX computer says that log(0) returns -inf as well as raise a divide-by-zero exception. So I think this is probably compliant with the standard -- one can decide to ignore the exceptions in certain contexts and honor them in others. (Probably even the 1/0 behavior can be defended this way.)
Probably the same. IEEE 754 may be more complex than you think!
I think non-reflexivity of nan in Python is an accidental feature.
It is not.
Python's float type was not designed with NaN in mind and until recently, it was relatively difficult to create a nan in pure python.
And when we did add NaN and Inf we thought about the issues carefully.
Are you proposing changes again? Because it sure sounds like you are unhappy with the status quo and will not take an explanation, however authoritative it is. Given a language with the 6 comparisons like Python (and most do), they have to be mapped to the IEEE comparison *somehow*, and I believe we chose one of the most logical translations imaginable (given that nobody likes == and != raising exceptions). -- --Guido van Rossum (python.org/~guido)

On 10/8/2012 12:47 PM, Guido van Rossum wrote:
I agree. In C, as I remember, a function can both (passively) 'raise an exception' by setting errno *and* return a value. This requires the programmer to check for an exception, and forgetting to do so is a common bug. In Python, raising an exception actively aborts returning a value, so you had to choose one of the two behaviors.
Or this might be an accidental inconsistency, in that float multiplication was changed to return inf but pow was not. But I would be reluctant to fiddle with such details now. Alexander, while I might have chosen to make nan == nan True, I consider it a near tossup with no happy resolution and would not change it now. Guido's explanation is pretty clear: he went with the IEEE standard as interpreted for Python by Tim Peters. -- Terry Jan Reedy

On Mon, Oct 8, 2012 at 5:17 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Alexander, while I might have chosen to make nan == nan True, I consider it a near tossup with no happy resolution and would not change it now.
While I did suggest to change nan == nan result two years ago, <http://mail.python.org/pipermail/python-ideas/2010-March/006945.html>, I am not suggesting it now. Here I am merely trying to understand to what extent Python's float is implementing IEEE 754 and why in some cases Python's behavior deviates from the standard while in the case of nan == nan, IEEE 754 is taken as a gospel.
It would be helpful if that interpretation was clearly written somewhere. Without a written document this interpretation seems apocryphal to me. Earlier in this thread, Guido wrote: "I am not aware of an update to the standard." To the best of my knowledge IEEE Std 754 was last updated in 2008. I don't think the differences between 1985 and 2008 revisions matter much for this discussion, but since I am going to refer to chapter and verse, I will start by citing the document that I will use: IEEE Std 754(TM)-2008 (Revision of IEEE Std 754-1985) IEEE Standard for Floating-Point Arithmetic Approved 12 June 2008 IEEE-SA Standards Board (AFAICT, the main difference between 754-2008 and 754-1985 is that the former includes decimal floats added in 854-1987.) Now, let me put my language lawyer hat on and compare Python floating point implementations to IEEE 754-2008 standard. Here are the relevant clauses: 3. Floating-point formats 4. Attributes and rounding 5. Operations 6. Infinity, NaNs, and sign bit 7. Default exception handling 8. Alternate exception handling attributes 9. Recommended operations 10. Expression evaluation 11. Reproducible floating-point results Clause 3 (Floating-point formats) defines five formats: 3 binary and 2 decimal. Python supports a superset of decimal formats and a single binary format. Section 3.1.2 (Conformance) contains the following provision: "A programming environment conforms to this standard, in a particular radix, by implementing one or more of the basic formats of that radix as both a supported arithmetic format and a supported interchange format." I would say Python is conforming to Clause 3. Clause 4 (Attributes and rounding) is supported only by Decimal through contexts: "For attribute specification, the implementation shall provide language-defined means, such as compiler directives, to specify a constant value for the attribute parameter for all standard operations in a block; the scope of the attribute value is the block with which it is associated." I believe Decimal is mostly conforming, but float is not conforming at all. Clause 5 requires "[a]ll conforming implementations of this standard shall provide the operations listed in this clause for all supported arithmetic formats, except as stated below." In other words, a language standard that claims conformance with IEEE 754 must provide all operations unless the standard states otherwise. Let's try to map IEEE 754 required operations to Python float operations. 5.3.1 General operations sourceFormat roundToIntegralTiesToEven(source) sourceFormat roundToIntegralTiesToAway(source) sourceFormat roundToIntegralTowardZero(source) sourceFormat roundToIntegralTowardPositive(source) sourceFormat roundToIntegralTowardNegative(source) sourceFormat roundToIntegralExact(source) Python only provides float.__trunc__ which implements roundToIntegralTowardZero. (The builtin round() belongs to a different category because it changes format from double to int.) sourceFormat nextUp(source) sourceFormat nextDown(source) I don't think these are available for Python floats. sourceFormat remainder(source, source) - float.__mod__ Not fully conforming. For example, the standard requires remainder(-2.0, 1.0) to return -0.0, but in Python 3.3:
-2.0 % 1.0 0.0
On the other hand,
math.fmod(-2.0, 1.0) -0.0
sourceFormat minNum(source, source) sourceFormat maxNum(source, source) sourceFormat minNumMag(source, source) sourceFormat maxNumMag(source, source) I don't think these are available for Python floats. 5.3.3 logBFormat operations I don't think these are available for Python floats. 5.4.1 Arithmetic operations formatOf-addition(source1, source2) - float.__add__ formatOf-subtraction(source1, source2) - float.__sub__ formatOf-multiplication(source1, source2) - float.__mul__ formatOf-division(source1, source2) - float.__truediv__ formatOf-squareRoot(source1) - math.sqrt formatOf-fusedMultiplyAdd(source1, source2, source3) - missing formatOf-convertFromInt(int) - float.__new__ With exception of fusedMultiplyAdd, Python float is conforming. intFormatOf-convertToIntegerTiesToEven(source) intFormatOf-convertToIntegerTowardZero(source) intFormatOf-convertToIntegerTowardPositive(source) intFormatOf-convertToIntegerTowardNegative(source) intFormatOf-convertToIntegerTiesToAway(source) intFormatOf-convertToIntegerExactTiesToEven(source) intFormatOf-convertToIntegerExactTowardZero(source) intFormatOf-convertToIntegerExactTowardPositive(source) intFormatOf-convertToIntegerExactTowardNegative(source) intFormatOf-convertToIntegerExactTiesToAway(source) Python has a single builtin round(). 5.5.1 Sign bit operations sourceFormat copy(source) - float.__pos__ sourceFormat negate(source) - float.__neg__ sourceFormat abs(source) - float.__abs__ sourceFormat copySign(source, source) - math.copysign Python float is conforming. Now we are getting close to the issue at hand: """ 5.6.1 Comparisons Implementations shall provide the following comparison operations, for all supported floating-point operands of the same radix in arithmetic formats: boolean compareQuietEqual(source1, source2) boolean compareQuietNotEqual(source1, source2) boolean compareSignalingEqual(source1, source2) boolean compareSignalingGreater(source1, source2) boolean compareSignalingGreaterEqual(source1, source2) boolean compareSignalingLess(source1, source2) boolean compareSignalingLessEqual(source1, source2) boolean compareSignalingNotEqual(source1, source2) boolean compareSignalingNotGreater(source1, source2) boolean compareSignalingLessUnordered(source1, source2) boolean compareSignalingNotLess(source1, source2) boolean compareSignalingGreaterUnordered(source1, source2) boolean compareQuietGreater(source1, source2) boolean compareQuietGreaterEqual(source1, source2) boolean compareQuietLess(source1, source2) boolean compareQuietLessEqual(source1, source2) boolean compareQuietUnordered(source1, source2) boolean compareQuietNotGreater(source1, source2) boolean compareQuietLessUnordered(source1, source2) boolean compareQuietNotLess(source1, source2) boolean compareQuietGreaterUnordered(source1, source2) boolean compareQuietOrdered(source1, source2). """ Signaling comparisons are missing. Ordered/Unordered comparisons are missing. Note that the standard does not require any particular spelling for operations. "In this standard, operations are written as named functions; in a specific programming environment they might be represented by operators, or by families of format-specific functions, or by operations or functions whose names might differ from those in this standard." (Sec. 5.1) It would be perfectly conforming for python to spell compareSignalingEqual() as '==' and compareQuietEqual() as math.eq() or even ieee745_2008.compareQuietEqual(). The choice that Python made was not dictated by the standard. (As I have shown above, Python's % operation does not implement a conforming IEEE 754 residual(), but math.fmod() seems to fill the gap.) This post is already too long, so I'll leave Clauses 6-11 for another time. "IEEE 754 may be more complex than you think!" (GvR, earlier in this thread.) I hope I already made the case that Python's float does not conform to IEEE 754 and that IEEE 754 does not require an operation spelled "==" or "float.__eq__" to return False when comparing two NaNs. The standard requires support for 22 comparison operations, but Python's float supports around six. On top of that, Python has an operation that has no analogue in IEEE 754 - the "is" comparison. This is why IEEE 754 standard does not help in answering the main question in this thread: should (x is y) imply (x == y)? We need to formulate a rationale for breaking this implication without a reference to IEEE 754 or Tim's interpretation thereof. Language-lawyierly-yours, Alexander Belopolsky

On Mon, Oct 8, 2012 at 6:31 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Such a rationale exists in my mind. Since floats are immutable, an implementation may or may not intern certain float values (just as certain string and int values are interned but others are not). Therefore, the fact that "x is y" says nothing about whether the computations that produced x and y had anything to do with each other. This is not true for mutable objects: if I have two lists, computed separately, and find they are the same object, the computations that produced them must have communicated somehow, or the same list was passed in to each computations. So, since two computations might return the same object without having followed the same computational path, in another implementation the exact same computation might not return the same object, and so the == comparison should produce the same value in either case -- in particular, if x and y are both NaN, all 6 comparisons on them should return False (given that in general comparing two NaNs returns False regardless of the operator used). The reason for invoking IEEE 754 here is that without it, Python might well have grown a language-wide rule stating that an object should *always* compare equal to itself, as there would have been no significant counterexamples. (As it is, such a rule only exists for containers, and technically even there it is optional -- it is just not required for containers to invoke == for contained items that reference the same object.) -- --Guido van Rossum (python.org/~guido)

On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum <guido@python.org> wrote:
This is an interesting argument, but I don't quite understand it. Are you suggesting that some valid Python implementation may inter NaNs? Wouldn't that require that all NaNs are equal?
Therefore, the fact that "x is y" says nothing about whether the computations that produced x and y had anything to do with each other.
True.
True.
True, but this logic does not dictate what this values should be.
Except for operator compareQuietUnordered() which is missing in Python. Note that IEEE 754 also defines totalOrder() operation which is more or less lexicographical ordering of bit patterns. A hypothetical language could map its 6 comparisons to totalOrder() and still claim IEEE 754 conformity as long as it implements the other 22 comparison predicates somehow.
Why would it be a bad thing? Isn't this rule what Bertrand Meyer calls one of the pillars of civilization? It looks like you give a circular argument. Python cannot have a rule that x is y implies x == y because that would preclude implementing float.__eq__ as IEEE 754 equality comparison and we implement float.__eq__ as IEEE 754 equality comparison in order to provide a significant counterexample to x is y implies x == y rule. I am not sure how interning comes into play here, so I must have missed something.

On Mon, Oct 8, 2012 at 11:14 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Sorry, it seems I got this part slightly wrong. Forget interning. The argument goes the other way: If you *do* compute x and y exactly the same way, and if they don't return the same object, and if they both return NaN, the rules for comparing NaN apply, and the values must compare unequal. So if you compute them exactly the same way but somehow you do return the same object, that shouldn't suddenly make them compare equal.
Yes, but that's not the choice Python made, so it's irrelevant. (Unless you now *do* want to change the language, despite stating several times that you were just asking for explanations. :-)
I spent a week with Bertrand recently. He is prone to exaggeration. :-)
No, that's not what I meant -- maybe my turn of phrase "invoking IEEE" was confusing. The first part is what I meant: "Python cannot have a rule that x is y implies x == y because that would preclude implementing float.__eq__ as IEEE 754 equality comparison." The second half should be: "And we have already (independently from all this) decided that we want to implement float.__eq__ as IEEE 754 equality comparison." I'm sure a logician could rearrange the words a bit and make it look more logical. -- --Guido van Rossum (python.org/~guido)

On Tue, Oct 9, 2012 at 12:43 PM, Guido van Rossum <guido@python.org> wrote:
I'll have a go. It's a lot longer, though :) When designing their floating point support, language designers must choose between two mutually exclusive options: 1. IEEE754 compliant floating point comparison where NaN != NaN, *even if* they're the same object 2. The invariant that "x is y" implies "x == y" The idea behind following the IEEE754 model is that mathematics is a *value based system*. There is only really one NaN, just as there is only one 4 (or 5, or any other specific value). The idea of a number having an identity distinct from its value simply doesn't exist. Thus, when modelling mathematics in an object system, it makes sense to say that *object identity is irrelevant, and only value matters*. This is the approach Python has chosen: for *numeric* operations, including comparisons, object identity is irrelevant to the maximum extent that is practical. Thus "x = float('nan'); assert x != x" holds for *exactly the same reason* that "x = 10e50; y = 10e50; assert x == y" holds. However, when it comes to containers, being able to assume that "x is y" implies "x == y" has an immense practical benefit in terms of being able to implement a large number of non-trivial optimisations. Thus the Python language definition explicitly allows containers to make that assumption, *even though it is known not to be universally true*. This hybrid model means that even though "'x is y' implies 'x == y'" is not true in the general case, it may still be *assumed to be true* regardless by container implementations. In particular, the containers defined in the standard library reference are *required* to make this assumption. This does mean that certain invariants about containers don't hold in the presence of NaN values. This is mostly a theoretical concern, but, in those cases where it *does* matter, then the appropriate solution is to implement a custom container type that handles NaN values correctly. It's perhaps worth including a section explaining this somewhere in the language reference. It's not an accident that Python behaves the way it does, but it's certainly a rationale that can help implementors correctly interpret the rest of the language spec. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Oct 7, 2012 at 8:09 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
That's too bad. It sounds like this mailing list really wouldn't have enough space in its margins to convince you otherwise. And yet you are wrong.
Do you have any background at all in *numerical* mathematics?
It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False. I admit that a tutorial section describing the behavior would be good. But I am less than ever convinced that it's possible to explain the *reason* for the behavior in a tutorial. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False.
That's the weirdest part about this whole business, I think. Unless you're really keeping your wits about you, it's easy to forget that the assumption (x == y) == False implies (x != y) == True doesn't necessarily hold. This is actually a very important assumption when it comes to reasoning about programs -- even more important than reflexivity, etc, I believe. Consider if x == y: dosomething() else: dosomethingelse() where x and y are known to be floats. It's easy to see that the following is equivalent: if not x == y: dosomethingelse() else: dosomething() but it's not quite so easy to spot that the following is *not* equivalent: if x != y: dosomethingelse() else: dosomething() This trap is made all the easier to fall into because float comparison is *mostly* well-behaved, except for a small subset of the possible values. Most other nonstandard comparison behaviours in Python apply to whole types. E.g. we refuse to compare complex numbers for ordering, even if their values happen to be real, so if you try that you get an early exception. But the weirdness with NaNs only shows up in corner cases that may escape testing. Now, there *is* a third possibility -- we could raise an exception if a comparison involving NaNs is attempted. This would be a more faithful way of adhering to the IEEE 754 specification that NaNs are "unordered". More importantly, it would make the second code transformation above valid in all cases. So the question that really needs to be answered, I think, is not "Why is NaN == NaN false?", but "Why doesn't NaN == anything raise an exception, when it would make so much more sense to do so?" -- Greg

On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Because == raising an exception is really unpleasant. We had this in Python 2 for unicode/str comparisons and it was very awkward. Nobody arguing against the status quo seems to care at all about numerical algorithms though. I propose that you go find some numerical mathematicians and ask them. -- --Guido van Rossum (python.org/~guido)

On 9 October 2012 01:11, Guido van Rossum <guido@python.org> wrote:
The main purpose of quiet NaNs is to propagate through computation ruining everything they touch. In a programming language like C that lacks exceptions this is important as it allows you to avoid checking all the time for invalid values, whilst still being able to know if the end result of your computation was ever affected by an invalid numerical operation. The reasons for NaNs to compare unequal are no doubt related to this purpose. It is of course arguable whether the same reasoning applies to a language like Python that has a very good system of exceptions but I agree with Guido that raising an exception on == would be unfortunate. How many people would forget that they needed to catch those exceptions? How awkward could your code be if you did remember to catch all those exceptions? In an exception handling language it's important to know that there are some operations that you can trust. Oscar

On Mon, Oct 8, 2012 at 6:37 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I implemented a floating point context manager for gmpy2 and the MPFR floating point library. By default, it enables a non-stop mode where infinities and NaN are returned but you can also raise exceptions. You can experiment with gmpy2: http://code.google.com/p/gmpy/ Some examples
Standard disclaimers: * I'm the maintainer of gmpy2. * Please use SVN or beta2 (when it is released) to avoid a couple of embarrassing bugs. :(

Oscar Benjamin wrote:
The main purpose of quiet NaNs is to propagate through computation ruining everything they touch.
But they stop doing that as soon as they hit an if statement. It seems to me that the behaviour chosen for NaN comparison could just as easily make things go wrong as make them go right. E.g. while not (error < epsilon): find_a_better_approximation() If error ever ends up being NaN, this will go into an infinite loop. -- Greg

On Tue, Oct 9, 2012 at 7:19 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
But if you know that that's a possibility, you simply code your condition the other way: while error > epsilon: find_a_better_approximation() Which will then immediately terminate the loop if error bonks to NaN. ChrisA

On Oct 9, 2012 9:20 AM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:
I should expect that an experienced numericist would be aware of the possibility of a NaN and make a trivial modification of your loop to take advantage of the simple fact that any comparison with NaN returns false. It is only because you have artificially placed a not in the while clause that it doesn't work. I would have tested for error>eps without even thinking about NaNs. Oscar

On 09/10/12 11:32, Oscar Benjamin wrote:
Correct, but I'd like to point out that NaNs are a bit more sophisticated than just "numeric contagion". 1) NaNs carry payload, so you can actually identify what sort of calculation failed. E.g. NaN-27 might mean "logarithm of a negative number", while NaN-95 might be "inverse trig function domain error". Any calculation involving a single NaN is supposed to propagate the same payload, so at the end of the calculation you can see that you tried to take the log of a negative number and debug accordingly. 2) On rare occasions, NaNs can validly disappear from a calculation, leaving you with a non-NaN answer. The rule is, if you can replace the NaN with *any* other value, and still get the same result, then the NaN is irrelevant and can be consumed. William Kahan gives an example: For example, 0*NaN must be NaN because 0*∞ is an INVALID operation (NaN). On the other hand, for hypot(x, y) := √(x*x + y*y) we find that hypot(∞, y) = +∞ for all real y, finite or not, and deduce that hypot(∞, NaN) = +∞ too; naive implementations of hypot may do differently. Page 7 of http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF -- Steven

On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote:
It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False.
Not so. I think you are conflating NAN equality/inequality with ordering comparisons. Using Python 3.3: py> nan = float('nan') py> nan > 0 False py> nan < 0 False py> nan == 0 False py> nan != 0 True but: py> nan == nan False py> nan != nan True -- Steven

On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum <guido@python.org> wrote:
This smells like a bug in the != operator, it seems to fall back to not == which it didn't used to. More later.....
I'm fairly sure it's deliberate, and has been this way in Python for a long time. IEEE 754 also has x != x when x is a NaN (at least, for those IEEE 754 functions that return a boolean rather than signaling an invalid exception), and it's a well documented property of NaNs across languages. -- Mark

On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote:
This smells like a bug in the != operator, it seems to fall back to not == which it didn't used to. More later.....
I'm pretty sure the behaviour is correct. When I get home this evening, I will check my copy of the Standard Apple Numerics manual (one of the first IEEE 754 compliant systems). In the meantime, I quote from "What Every Computer Scientist Should Know About Floating-Point Arithmetic" "Since comparing a NaN to a number with <, ≤, >, ≥, or = (but not ≠) always returns false..." (Admittedly it doesn't specifically state the case of comparing a NAN with a NAN.) http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html -- Steven

On Sun, 07 Oct 2012 22:35:17 -0400 Ned Batchelder <ned@nedbatchelder.com> wrote:
+1 Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
I'm sorry. I didn't intend to refuse to document the behavior. I was mostly reacting to things I thought I read between the lines -- the suggestion that there is no reason for the NaN behavior except silly compatibility with an old standard that nobody cares about. From this it is only a small step to reading (again between the lines) the suggesting to change the behavior.
That sounds like a great addition to the docs, except for the nit that I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs" myself. Also, the words here can still cause confusion. The exact behavior is that every one of the 6 comparison operators (==, !=, <, <=, >, >=) returns False when either argument (or both) is a NaN. I think your suggested words could lead someone to believe that they mean that x != NaN or NaN != Nan would return True. Anyway, once we can agree to words I agree that we should update that section. -- --Guido van Rossum (python.org/~guido)

On 10/8/2012 12:25 PM, Guido van Rossum wrote:
How about: "In accordance with the IEEE 754 standard, when NaNs are compared to any value, even another NaN, the result is always False, regardless of the comparison. This is because NaN represents an unknown result. There is no way to know the relationship between an unknown result and any other result, especially another unknown one. Even comparing a NaN to itself always produces False." --Ned.

Guido van Rossum writes:
Sounds good. (But now maybe we also need to come clean with the exceptions for NaNs compared as part of container comparisons?)
For a second I thought you meant IEEE 754 Exceptions. Whew! How about: """ For reasons of efficiency, Python allows comparisons of containers to shortcut element comparisons. These shortcuts mean that it is possible that comparison of two containers may return True, even if they contain NaNs. For details, see the language reference[1]. """ Longer than I think it deserves, but maybe somebody has a better idea? Footnotes: [1] Sorry about that, but details don't really belong in a *Python* tutorial. Maybe this should be "see the implementation notes"?

Steven D'Aprano wrote:
1) It is not the case that NaN <comp> NaN is always false.
Huh -- well, apparently NaN != Nan --> True. However, borrowing Steven's earlier example, and modifying slightly: sqr(-1) != sqr(-1) Shouldn't this be False? Or, to look at it another way, surely somewhere out in the Real World (tm) it is the case that two NaNs are indeed equal. ~Ethan~

Just a curiosity here (as I can guess of plausible reasons myself, so there probably are some official stances). Is there a reason NaNs are not instances of NaN class? Then x == x would be True (as they want), but [this NaN] == [that NaN] would be False, as expected. I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1), but it seems a lot less of a big deal than all of the exceptions with container equalities. Thanks, Joshua

On 10/10/12 09:13, Joshua Landau wrote:
Because that would complicate Python's using floats for absolutely no benefit. Instead of float operations always returning a float, they would have to return a float or a NAN. To check for a valid floating point instance, instead of saying: isinstance(x, float) you would have to say: isinstance(x, (float, NAN)) And what about infinities, denorm numbers, and negative zero? Do they get dedicated classes too? And what is the point of this added complexity? Nothing. You *still* have the rule that "x == x for all x, except for NANs". The only difference is that "NANs" now means "instances of NAN class" rather than "NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of a nuisance because some valid floating point values aren't floats but have a different class, but nothing meaningful is different.
Then x == x would be True (as they want), but [this NaN] == [that NaN] would be False, as expected.
Making NANs their own class wouldn't give you that. If we wanted that behaviour, we could have it without introducing a NAN class: just change the list __eq__ method to scan the list for a NAN using math.isnan before checking whether the lists were identical. But that would defeat the purpose of the identity check (an optimization to avoid scanning the list)! Replacing math.isnan with isinstance doesn't change that.
I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),
That question has already been raised, and answered, repeatedly in this thread.
but it seems a lot less of a big deal than all of the exceptions with container equalities.
Container equalities are not a big deal. I'm not sure what problem you think you are solving. -- Steven

On Tue, Oct 9, 2012 at 9:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I'm sometimes surprised at the creativity and passion behind solutions to this issue. I've been a Python user for some years now, including time dealing with stuff like numpy where you're fairly likely to run into NaNs. I've been an active member of several support communities where I can confidently say I have encountered tens of thousands of Python questions. Not once can I recall ever having or seeing anyone have an actual problem that I had or someone else had due to the way Python handles NaN. As far as I can tell, it works _perfectly_. I appreciate the aesthetic concerns, but I really wish someone would explain to me what's actually broken and in need of fixing. Mike

On 10/10/12 2:25 AM, Mike Graham wrote:
While I also don't think that anything needs to be fixed, I must say that in my years of monitoring tens of thousands of Python questions, there have been a few legitimate problems with the NaN behavior. It does come up from time to time. The most frequent problem is checking if a list contains a NaN. The obvious thing to do for many users: nan in list_of_floats This is a reasonable prediction based on what one normally does for most objects in Python, but this is quite wrong. But because list.__contains__() checks for identity first, it can look like it works when people test it out:
Then they write their code doing the wrong thing thinking that they tested their approach. I classify this as a wart: it breaks reasonable predictions from users, requires more exceptions-based knowledge about NaNs to use correctly, and can trap users who do try to experiment to determine the behavior. But I think that the cost of acquiring and retaining such knowledge is not so onerous as to justify the cost of any of the attempts to fix the wart. The other NaN wart (unrelated to this thread) is that sorting a list of floats containing a NaN will usually leave the list unsorted because "inequality comparisons with a NaN always return False" breaks the assumptions of timsort and other sorting algorithms. You should remember this, as you once demonstrated the problem: http://mail.python.org/pipermail/python-ideas/2011-April/010063.html This is a real problem, so much so that numpy works around it by enforcing our sorts to always sort NaN at the end of the array. Unfortunately, lists do not have the luxury of cheaply knowing the type of all of the objects in the list, so this is not an option for them. Real problems, but nothing that motivates a change, in my opinion. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 10 October 2012 02:14, Steven D'Aprano <steve@pearwood.info> wrote:
Not the way I'm proposing it.
And what about infinities, denorm numbers, and negative zero? Do they get dedicated classes too?
Infinities? No, although they might well if the infinities were different (set of reals vs set of ints, for example). Denorms? No, that's a completely different thing. -0.0? No, that's a completely different thing. I was asking, because instances of a class maps on to a behavior that matches *almost exactly* what *both* parties want, why was it not used? This is not the case with anything other than that. And what is the point of this added complexity? Nothing.
Simplicity. It's simpler.
You *still* have the rule that "x == x for all x, except for NANs".
False. I was proposing that x == x but NAN() != NAN().
False, if you subclass float.
Then x == x would be True (as they want), but [this NaN] == [that NaN]
False.
as per my previous "implementation".
False. x != x, so that has *not* been "answered". This was an example problem with my own suggested implementation. but it seems a lot less of a big deal than all of the exceptions with
Why would you assume that? I mentioned it from *honest* *curiosity*, and all I got back was an attack. Please, I want to be civil but you need to act less angrily. [Has not been spell-checked, as I don't really have time </lie>] Thank you for your time, even though I disagree, Joshua Landau

On 10 October 2012 22:33, Joshua Landau <joshua.landau.ws@gmail.com> wrote:
After reconsidering, I regret these sentences. Yes, I do still believe your response was overly angry, but I did get a thought out response and you did try and address my concerns. In the interest of benevolence, may I redact my statement?

I don't normally triple-post, but here it goes. After re-re-reading this thread, it turns out one *(1)* post and two *(2)* answers to that post have covered a topic very similar to the one I have raised. All of the others, to my understanding, do not dwell over the fact that *float("nan") is not float("nan")* . The mentioned post was not quite the same as mine, but it still had two replies. I will respond to them here. My response, again, is a curiosity why, *not* a suggestion to change anything. I agree that there is probably no real concern with the current state, I have never had a concern and the concern caused by change would dwarf any possible benefits. Response 1: This implies that you want to differentiate between -0.0 and +0.0. That is bad. My response: Why would I want to do that? Response 2: "There is not space on this thread to convince you otherwise." [paraphrased] My response: That comment was not directed at me and thus has little relevance to my own post. Hopefully now you should understand why I felt need to ask the question after so much has already been said on the topic. Finally, Mike Graham says (probably referring to me): "I'm sometimes surprised at the creativity and passion behind solutions to this issue." My response: It was an immediate thought, not one dwelled upon. The fact it was not answered in the thread prompted my curiosity. It is *honestly* nothing more.

On 11/10/12 09:05, Joshua Landau wrote:
That's no different from any other float. py> float('nan') is float('nan') False py> float('1.5') is float('1.5') False Floats are not interned or cached, although of course interning is implementation dependent and this is subject to change without notice. For that matter, it's true of *nearly all builtins* in Python. The exceptions being bool(obj) which returns one of two fixed instances, and int() and str(), where *some* but not all instances are cached.
If you are doing numeric work, you *should* differentiate between -0.0 and 0.0. That's why the IEEE 754 standard mandates a -0.0. Both -0.0 and 0.0 compare equal, but they can be distinguished (although doing so is tricky in Python). The reason for distinguishing them is to distinguish between underflow to zero from positive or negative values. E.g. log(x) should return -infinity if x underflows from a positive value, and a NaN if x underflows from a negative. -- Steven

On Thu, Oct 11, 2012 at 2:20 AM, Steven D'Aprano <steve@pearwood.info> wrote:
E.g. log(x) should return -infinity if x underflows from a positive value, and a NaN if x underflows from a negative.
IEEE 754 disagrees. :-) Both log(-0.0) and log(0.0) are required to return -infinity (and/or signal the divideByZero exception). And as for sqrt(-0.0) returning -0.0... Grr. I've never understood the motivation for that one, especially as it disagrees with the usual recommendations for complex square root (where the real part of the result *always* has its sign bit cleared). Mark

[Mark Dickinson]
The only rationale I've seen for this is in Kahan's obscure paper "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit". Hard to find. Here's a mostly readable scan: http://port70.net/~nsz/articles/float/kahan_branch_cuts_complex_elementary_f... In part it's to preserve various identities, such as that sqrt(conjugate(z)) is the same as conjugate(sqrt(z)). When z is +0, that becomes sqrt(conjugate(+0)) same_as conjugate(sqrt(+0)) which is sqrt(-0) same_as conjugate(+0) which is sqrt(-0) same as -0 Conviced? LOL. There are others in the paper ;-)

On Fri, Oct 12, 2012 at 8:42 PM, Tim Peters <tim.peters@gmail.com> wrote:
Not really. :-) In fact, it's exactly that paper that made me think sqrt(-0.0) -> -0.0 is suspect. The way I read it, the argument from the paper implies that cmath.sqrt(complex(0.0, -0.0)) should be complex(0.0, -0.0), which I have no problem with---it makes things nice and neat: quadrants 1 and 2 in the complex plane map to quadrant 1, and quadrants 3 and 4 to quadrant 4, with the signs of the zeros making it clear what 'quadrant' means in all (non-nan) cases. But I don't see how to get from there to math.sqrt(-0.0) being -0.0. It's exactly the mismatch between the real and complex math that makes no sense to me: math.sqrt(-0.0) should resemble cmath.sqrt(complex(-0.0, +/-0.0)). But the latter, quite reasonably, is complex(0.0, +/-0.0) (at least according to both Kahan and C99 Annex G), while the former is specified to be -0.0 in IEEE 754. -- Mark

On 11 October 2012 02:20, Steven D'Aprano <steve@pearwood.info> wrote:
Confusing re-use of identity strikes again. Can anyone care to explain what causes this? I understand float(1.5) is likely to return the inputted float, but that's as far as I can reason. What I was saying, though, is that all other posts assumed equality between two different NaNs should be the same as identity between a NaN and itself. This is what I'm really asking about, I guess.
Interesting. Can you give me a more explicit example? When would you not *want* f(-0.0) to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on creation]

On 2012-10-12 19:42, Joshua Landau wrote:
It re-uses an immutable literal:
and 'float' returns its argument if it's already a float:
float(1.5) is 1.5 True
Therefore:
float(1.5) is float(1.5) True
But apart from that, when a new object is created, it doesn't check whether it's identical to another, except in certain cases such as ints in a limited range:
And it's an implementation-specific behaviour.

On Fri, Oct 12, 2012 at 7:42 PM, Joshua Landau <joshua.landau.ws@gmail.com> wrote:
A few examples: (1) In the absence of exceptions, 1 / 0.0 is +inf, while 1 / -0.0 is -inf. So e.g. the function exp(-exp(1/x)) has different values at -0.0 and 0.0:
(2) For the atan2 function, we have e.g.,
This gives atan2 a couple of nice invariants: the sign of the result always matches the sign of the first argument, and atan2(-y, x) == -atan2(y, x) for any (non-nan) x and y. (3) Similarly, for complex math functions (which aren't covered by IEEE 754, but are standardised in various other languages), it's sometimes convenient to be able to depend on invariants like e.g. asin(z.conj()) == asin(z).conj(). Those are only possible if -0.0 and 0.0 are distinguished; the effect is most visible if you pick values lying on a branch cut.
You can't take that too far, though: e.g., it would be nice if complex multiplication had the property that (z * w).conjugate() was always the same as z.conjugate() * w.conjugate(), but it's impossible to keep both that invariant and the commutativity of multiplication. (E.g., consider the result of complex(1, 1) * complex(1, -1).) -- Mark

Thank you all for being so thorough. I think I'm sated for tonight. ^^ With all due respect, Joshua Landau

Ethan Furman writes:
Or, to look at it another way, surely somewhere out in the Real World (tm) it is the case that two NaNs are indeed equal.
Sure, but according to Kahan's Uncertainty principle, you'll never be able to detect it. Really-there's no-alternative-to-backward-compatibility-or-IEEE754-ly y'rs

On Mon, Oct 8, 2012 at 9:39 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
Looks fine, but I'd suggest leaving out the philosophy ('there is no way to know ...') and sticking to the statement that Python follows the IEEE 754 standard in this respect. The justification isn't particularly convincing and (IMO) only serves to invite arguments. -- Mark

On Sun, Oct 07, 2012 at 10:35:17PM -0400, Ned Batchelder wrote:
NANs don't quite mean "unknown result". If they did they would probably be called "MISSING" or "UNKNOWN" or "NA" (Not Available). NANs represent a calculation result which is Not A Number. Hence the name :-) Since we're talking about the mathematical domain here, a numeric calculation that doesn't return a numeric result could be said to have no result at all: there is no real-valued x for which x**2 == -1, hence sqrt(-1) can return a NAN. It certainly doesn't mean "well, there is an answer, but I don't know what it is". It means "I know that there is no answer". Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say that they are equal. If we did, we could prove anything: sqrt(-1) = sqrt(-2) Square both sides: -1 = -2 I was not on the IEEE committee, so I can't speak for them, but my guess is that they reasoned that since there are an infinite number of "no result" not-a-number calculations, but only a finite number of NAN bit patterns available to be used for them, it isn't even safe to presume that two NANs with the same bit pattern are equal since they may have come from completely different calculations. Of course this was before object identity was a relevant factor. As I've stated before, I think that having collections choose to optimize away equality tests using object identity is fine. If I need a tuple that honours NAN semantics, I can subclass tuple to get one. I shouldn't expect the default tuple behaviour to carry that cost. By the way, NANs are awesome and don't get anywhere near enough respect. Here's a great idea from the D language: http://www.drdobbs.com/cpp/nans-just-dont-get-no-respect/240005723 -- Steven

On Tue, Oct 9, 2012 at 12:16 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This is quite true, but in Python "Not A Number" is spelled None. In many aspects, None is like signaling NaN - any numerical operation on it results in a type error, but None == None is True. ..
This is a typical mathematical fallacy where a progression of seemingly equivalent equations contains an invalid operation. See http://en.wikipedia.org/wiki/Mathematical_fallacy#All_numbers_equal_all_othe... This is not an argument to make nan == nan false. The IEEE 754 argument goes as follows: in the domain of 2**64 bit patterns most patterns represent real numbers, some represent infinities and some do not represent either infinities or numbers. Boolean comparison operations are defined on the entire domain, but <, =, or > outcomes are not exclusive if NaNs are present. The forth outcome is "unordered." In other words for any two patterns x and y one and only one of the following is true: x < y or x = y or x > y or x and y are unordered. If x is NaN, it compares as unordered to any other pattern including itself. This explains why compareQuietEqual(x, x) is false when x is NaN. In this case, x is unordered with itself, unordered is different from equal, so compareQuietEqual(x, x) cannot be true. It cannot raise an exception either because it has to be quiet. Thus the only correct result is to return false. The problem that we have in Python is that float.__eq__ is used for too many different things and compareQuietEqual is not always appropriate. Here is a partial list: 1. x == y 2. x in [y] 3. {y:1}[x] 4. x in {y} 5. [y].index(x) In python 3, we already took a step away from using the same notion of equality in all these cases. Thus in #2, we use x is y or x == y instead of plain x == y. But that leads to some strange results:
An alternative would be to define x in l as any(isnan(x) and isnan(y) or x == y for y in l) when x and all elements of l are floats. Again, I am not making a change proposal - just mention a possibility.

On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.
I don't actually know Java, but if I run class HelloNaN { public static void main(String[] args) { double nan1 = 0.0 / 0.0; double nan2 = 0.0 / 0.0; System.out.println(nan1 == nan2); } } I get the output "false". Mike

On 04/10/12 21:48, Max Moroz wrote:
You shouldn't over-generalize. Some built-ins do short-circuit __eq__ when the objects are identical. I believe that strings and ints both do. Other types might not.
Is there any disadvantage to checking whether the equality was called with the same object, and if it was, return `True` right away?
That would break floats and Decimals, both of which support NANs. The decision whether or not to optimize __eq__ should be left up to the type. Some types, for example, might decide to optimize x == x even if x contains a NAN or other objects that break reflexivity of equality. Other types might prefer not to. (Please do not start an argument about NANs and reflexivity. That's been argued to death, and there are very good reasons for the IEEE 754 standard to define NANs the way they do.) Since frozensets containing NANs are rare (I presume), I think it is reasonable to optimize frozenset equality. But I do not think it is reasonable for Python to mandate identity checking before __eq__.
I'm not sure what you are doing here, because dicts (at least in Python 3.2) already short-circuit equality: py> NAN = float('nan') py> NAN == NAN False py> d = {NAN: 42} py> d[NAN] 42 Actually, that behaviour goes back to at least 2.4, so I'm not sure how you are doing memoization and not seeing the same optimization. -- Steven

On 2012-10-04 15:07, Mike Graham wrote:
Think of it this way: Calculation A returns NaN for some reason Calculation B also returns NaN for some reason Have they really returned the same result? Just because they're both NaN doesn't mean that they're the _same_ NaN...

2012/10/4 Steven D'Aprano <steve@pearwood.info>:
This optimization is not implemented for Unicode strings. PyObject_RichCompareBool() implements this optimization which leads to incorrect results: nan = float("nan") mytuple = (nan,) assert mytuple != mytuple # fails I think that the optimization should be implemented for Unicode strings, but disabled in PyObject_RichCompareBool(). @Max Moroz: Can you please open an issue on bugs.python.org? Victor

On 05/10/12 01:08, Victor Stinner wrote:
That does not match my experience. In Python 3.2, I generate a large unicode string, and an equal but not identical copy: s = "aЖcdef"*100000 t = "a" + s[1:] assert s is not t and s == t Using timeit, s == s is about 10000 times faster than s == t. -- Steven

On 4 October 2012 17:05, MRAB <python@mrabarnett.plus.com> wrote:
This was discussed not long ago in a different thread. Here is the line: http://hg.python.org/cpython/file/bd8afb90ebf2/Objects/unicodeobject.c#l1050... As I understood it that line is the reason that comparisons for interned strings are faster. Oscar

On Thu, Oct 4, 2012 at 7:19 AM, MRAB <python@mrabarnett.plus.com> wrote:
Someone who performs two calculations with float numbers should never compare their results for equality. It's really a bug to rely on that comparison: # this is a bug # since the result of this comparison for regular numbers is unpredictable # so doesn't it really matter how this behaves when NaNs are compared? if a/b == c/d: # ... On the other hand, comparing a number to another number, when none of the two numbers are involved in a calculation, is perfectly fine: # this is not a bug # too bad that it won't work as expected # when input1 == input2 == 'nan' a = float(input1) b = float(input2) if a == b: # ... So it seems to me your argument is this: "let's break the expectations of developers who are writing valid code, in order to partially meet the expectations of developers who are writing buggy code". If so, I disagree.

On Thu, 4 Oct 2012 17:08:40 +0200 Victor Stinner <victor.stinner@gmail.com> wrote:
I think we should wait for someone to complain before disabling it. It's a useful optimization. Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Fri, Oct 05, 2012 at 01:00:10AM +0200, Antoine Pitrou wrote:
+1 I will go to the wall to defend correct IEEE 754 semantics for NANs, but I also support containers that optimise away those semantics by default. I think it's too early to talk about disabling it without even the report of a bug caused by it. -- Steven

On Thu, Oct 04, 2012 at 05:08:40PM +0200, Victor Stinner wrote:
I think that the optimization should be implemented for Unicode strings, but disabled in PyObject_RichCompareBool().
Actually, this change to PyObject_RichCompareBool() has been made before, but was reverted after the discussion in http://bugs.python.org/issue4296 Cheers, Sven

On 10/04/2012 03:53 PM, Steven D'Aprano wrote:
But it seems like set and frozenset behave like this anyway (using "is" to compare it's items):
frozenset([float("nan")]) == frozenset([float("nan")]) False
So the "is" optimization should not change it's semantics. (I tested this in Python 2.7.3 and 3.2.3)

On Thu, Oct 4, 2012 at 9:53 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Why not? This is python-ideas, isn't it? I've been hearing that IEEE 754 committee had some "very good reasons" to violate reflexivity of equality comparison with NaNs since I first learned about NaNs some 20 years ago. From time to time, I've also heard claims that there are some important numeric algorithms that depend on this behavior. However, I've never been able to dig out the actual rationale that convinced the committee that voted for IEEE 754 or any very good reasons to preserve this behavior in Python. I am not suggesting any language changes, but I think it will be useful to explain why float('nan') != float('nan') somewhere in the docs. A reference to IEEE 754 does not help much. Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.

On Mon, Oct 8, 2012 at 11:35 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
NaN isn't a single value, but a whole category of values. Conceptually, it's an uncountably infinite (I think that's the technical term) of invalid results; in implementation, NaN has the highest possible exponent and any non-zero mantissa. So then the question becomes: Should *all* NaNs be equal, or only ones with the same bit pattern? Aside from signalling vs non-signalling NaNs, I don't think there's any difference between one and another, so they should probably all compare equal. And once you go there, a huge can o'worms is opened involving floating point equality. It's much MUCH easier and simpler to defer to somebody else's standard and just say "NaNs behave according to IEEE 754, blame them if you don't like it". There would possibly be value in guaranteeing reflexivity, but it would increase confusion somewhere else. ChrisA

On Sun, Oct 7, 2012 at 5:50 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Very funny. Seriously, we can't change our position on this topic now without making a lot of people seriously unhappy. IEEE 754 it is. -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 8:54 PM, Guido van Rossum <guido@python.org> wrote:
Seriously, we can't change our position on this topic now without making a lot of people seriously unhappy. IEEE 754 it is.
I did not suggest a change. I wrote: "I am not suggesting any language changes, but I think it will be useful to explain why float('nan') != float('nan') somewhere in the docs." If there is a concise explanation for the choice of IEEE 754 vs. Java, I think we should write it down and put an end to this debate.

On Sun, Oct 7, 2012 at 6:09 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Referencing Java here is absurd and I still consider this suggestion as a troll. Python is not in any way based on Java. On the other hand referencing IEEE 754 makes all the sense in the world, since every other aspect of Python float is based on IEEE 754 double whenever the underlying platform implements this standard -- and all modern CPUs do. I don't think there's anything else we need to say. -- --Guido van Rossum (python.org/~guido)

On Sun, Oct 7, 2012 at 9:51 PM, Guido van Rossum <guido@python.org> wrote:
Referencing Java here is absurd and I still consider this suggestion as a troll. Python is not in any way based on Java.
I did not suggest that. Sorry if it came out this way. I am well aware that Python and Java were invented independently and have different roots. (IIRC, Java was born from Oak and Python from ABC and Oak and ABC were both developed in the 1980s.) IEEE 784 precedes both languages and one team decided that equality reflexivity for hashable objects was more important than IEEE 784 compliance while the other decided otherwise. Many Python features (mostly library) are motivated by C. In the 90s, "because C does it this way" was a good explanation for a language feature. Doing things differently from the "C way", on the other hand would deserve an explanation. These days, C is rarely first language that a student learns. Hopefully Python will take this place in not so distant future, but many students graduated in late 90s - early 2000s knowing nothing but Java. As a result, these days it is a valid question to ask about a language feature: "Why does Python do X differently from Java?" Hopefully in most cases the answer is "because Python does it better." In case of nan != nan, I would really like to know a modern reason why Python's way is better. Better compliance with a 20-year old standard does not really qualify.

On Sun, Oct 7, 2012 at 10:33 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
In case of nan != nan, I would really like to know a modern reason why Python's way is better.
To this end, a link to Kahan's "How Java’s Floating-Point Hurts Everyone Everywhere" <http://www.cs.berkeley.edu/~wkahan/JAVAhurt.pdf> may be appropriate.

On Sun, Oct 7, 2012 at 7:33 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Explaining the differences between Python and Java is a job for educators, not for the language reference. I agree that documenting APIs as "this behaves just like C" does not have the same appeal -- but that turn of phrase was mostly used for system calls anyway, and for those I think that a slightly modified redirection (to the OS man pages) is still completely appropriate.
I am not aware of an update to the standard. Being 20 years old does not make it outdated. Again, there are plenty of reasons (you have to ask the numpy folks), but I don't think it is the job of the Python reference manual to give its motivations. It just needs to explain how things work, and if that can be done best by deferring to an existing standard that's fine. Of course a tutorial should probably mention this behavior, but a tutorial does not have the task of giving you the reason for every language feature either -- most readers of the tutorial don't have the context yet to understand those reasons, many don't care, and whether they like it or not, it's not going to change. You keep getting very close to suggesting to make changes, despite your insistence that you just want to know the reason. But assuming you really just are asking in an obnoxious way for the reason, I recommand that you ask the people who wrote the IEEE 754 standard. I'm sure their explanation (which I recall having read once but can't reproduce here) makes sense for Python too. -- --Guido van Rossum (python.org/~guido)

On 10/8/2012 12:19 PM, Guido van Rossum wrote:
I am not aware of an update to the standard. Being 20 years old does not make it outdated.
Similarly, being hundreds or thousands of years old does not make the equality standard, which includes reflexivity of equality, outdated. The IEEE standard violated that older standard. http://bugs.python.org/issue4296 illustrates some of the problems than come with that violation. But given the compromise made to maintain sane behavior of Python's collection classes, I see little reason to change nan in isolation. I wonder if it would be helpful to make a NaN subclass of floats with its own arithmetic and comparison methods. This would clearly mark a nan as Not a Normal float. Since subclasses rule (at least some) binary operations*, this might also simplify normal float code. But perhaps this was considered and rejected before adding math.isnan in 2.6. (And ditto for infinities.) * in that class_ob op subclass_ob is delegated to subclass.__op__, but I am not sure if this applies only to arithmetic, comparisons, or both. -- Terry Jan Reedy

Terry Reedy writes:
I wonder if it would be helpful to make a NaN subclass of floats with its own arithmetic and comparison methods.
It can't be helpful, unless you go a lot further. Specifically, you'd need to require containers to check every element for NaN-ness. That doesn't seem very practical. In any case, the presentation by Kahan (cited earlier by Alexander himself) demolishes the idea that any sort of attempt to implement DWIM for floats in a programming language can succeed at the present state of the art. The best we can get is DWGM ("do what Guido means", even if what Guido means is "ask the Timbot"<wink/>). Kahan pretty explicitly endorses this approach, by the way. At least in the context of choosing default policy for IEEE 754 Exceptions.

On 10/7/2012 9:51 PM, Guido van Rossum wrote:
I don't understand the reluctance to address a common conceptual speed-bump in the docs. After all, the tutorial has an entire chapter (http://docs.python.org/tutorial/floatingpoint.html) that explains how floats work, even though they work exactly as IEEE 754 says they should. A sentence in section 5.4 (Numeric Types) would help. Something like, "In accordance with the IEEE 754 standard, NaN's are not equal to any value, even another NaN. This is because NaN doesn't represent a particular number, it represents an unknown result, and there is no way to know if one unknown result is equal to another unknown result." --Ned.

On 08/10/2012 03:35, Ned Batchelder wrote:
I understand that the undefined result of a computation is not the same as the undefined result of another computation. (E.g. one might represent positive infinity, another might represent underflow or loss of accuracy.) But I can't help feeling (strongly) that the result of a computation should be equal to itself. In other words, after x = float('nan') y = float('nan') I would expect x != y but x == x After all, how much sense does this make (I got this in a quick test with Python 2.7.3):
Making equality non-reflexive feels utterly wrong to me, partly no doubt because of my mathematical background, partly because of the difficulty in implementing container objects and algorithms and God knows what else when you have to remember that some of the objects they may deal with may not be equal to themselves. In particular the difference between my last two examples ( D[1]!=D[2] but [x]==[x] ) looks impossible to justify except by saying that for historical reasons the designers of lists and the designers of dictionaries made different - but entirely reasonable - assumptions about the equality relation, and (perhaps) whether identity implies equality (how do you explain to a Python learner that it doesn't (pathological code examples aside) ???). Couldn't each NAN when generated contain something that identified it uniquely, so that different NANs would always compare as not equal, but any given NAN would compare equal to itself? Rob Cliffe

On Sun, Oct 7, 2012 at 11:09 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
If we take this route and try to distinguish NaNs with different payload, I am sure you will want to distinguish between -0.0 and 0.0 as well. The later would violate transitivity in -0.0 == 0 == 0.0. The only sensible thing to do with NaNs is either to treat them all equal (the Eiffel way) or to stick to IEEE default. I don't think NaN behavior in Python is a result of a deliberate decision to implement IEEE 754. If that was the case, why 0.0/0.0 does not produce NaN? Similarly, Python math library does not produce infinities where IEEE 754 compliant library should:
Some other operations behave inconsistently:
2 * 10.**308 inf
but
I think non-reflexivity of nan in Python is an accidental feature. Python's float type was not designed with NaN in mind and until recently, it was relatively difficult to create a nan in pure python. It is also not true that IEEE 754 requires that nan == nan is false. IEEE 754 does not define operator '==' (nor does it define boolean false). Instead, IEEE defines a comparison operation that can have one of four results: >, <, =, or unordered. The standard does require than NaN compares unordered with anything including itself, but it does not follow that a language that defines an == operator with boolean results must define it so that nan == nan is false.

On Sun, Oct 7, 2012 at 8:46 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Oh, it was. It was very deliberate. Like in many other areas of Python, I refused to invent new rules when there was existing behavior elsewhere that I could borrow and with which I had no reason to quibble. (And in the case of floating point behavior, there really is no alternate authority to choose from besides IEEE 754. Languages that disagree with it do not make an authority.) Even if I *did* have reasons to quibble with the NaN behavior (there were no NaNs on the mainframe where I learned programming, so they were as new and weird to me as they are to today's novices), Tim Peters, who has implemented numerical libraries for Fortran compilers in a past life and is an absolute authority on floating points, convinced me to follow IEEE 754 as closely as I could.
If that was the case, why 0.0/0.0 does not produce NaN?
Easy. It was an earlier behavior, from the days where IEEE 754 hardware did not yet rule the world, and Python didn't have much op an opinion on float behavior at all -- it just did whatever the platform did. Infinities and NaNs were not on my radar (I hadn't met Tim yet :-). However division by zero (which is not just a float but also an int behavior) was something that I just had to address, so I made the runtime check for it and raise an exception. When we became more formal about this, we considered changing this but decided that the ZeroDivisionError was more user-friendly than silently propagating NaNs everywhere, given the typical use of Python. (I suppose we could make it optional, and IIRC that's what Decimal does -- but for floats we don't have a well-developed numerical context concept yet.)
Again, this mostly comes from backward compatibility with the math module's origins (and it is as old as Python itself, again predating its use of IEEE 754). AFAIK Tim went over the math library very carefully and cleaned up what he could, so he probably thought about this as well. Also, IIUC the IEEE library prescribes exceptions as well as return values; e.g. "man 3 log" on my OSX computer says that log(0) returns -inf as well as raise a divide-by-zero exception. So I think this is probably compliant with the standard -- one can decide to ignore the exceptions in certain contexts and honor them in others. (Probably even the 1/0 behavior can be defended this way.)
Probably the same. IEEE 754 may be more complex than you think!
I think non-reflexivity of nan in Python is an accidental feature.
It is not.
Python's float type was not designed with NaN in mind and until recently, it was relatively difficult to create a nan in pure python.
And when we did add NaN and Inf we thought about the issues carefully.
Are you proposing changes again? Because it sure sounds like you are unhappy with the status quo and will not take an explanation, however authoritative it is. Given a language with the 6 comparisons like Python (and most do), they have to be mapped to the IEEE comparison *somehow*, and I believe we chose one of the most logical translations imaginable (given that nobody likes == and != raising exceptions). -- --Guido van Rossum (python.org/~guido)

On 10/8/2012 12:47 PM, Guido van Rossum wrote:
I agree. In C, as I remember, a function can both (passively) 'raise an exception' by setting errno *and* return a value. This requires the programmer to check for an exception, and forgetting to do so is a common bug. In Python, raising an exception actively aborts returning a value, so you had to choose one of the two behaviors.
Or this might be an accidental inconsistency, in that float multiplication was changed to return inf but pow was not. But I would be reluctant to fiddle with such details now. Alexander, while I might have chosen to make nan == nan True, I consider it a near tossup with no happy resolution and would not change it now. Guido's explanation is pretty clear: he went with the IEEE standard as interpreted for Python by Tim Peters. -- Terry Jan Reedy

On Mon, Oct 8, 2012 at 5:17 PM, Terry Reedy <tjreedy@udel.edu> wrote:
Alexander, while I might have chosen to make nan == nan True, I consider it a near tossup with no happy resolution and would not change it now.
While I did suggest to change nan == nan result two years ago, <http://mail.python.org/pipermail/python-ideas/2010-March/006945.html>, I am not suggesting it now. Here I am merely trying to understand to what extent Python's float is implementing IEEE 754 and why in some cases Python's behavior deviates from the standard while in the case of nan == nan, IEEE 754 is taken as a gospel.
It would be helpful if that interpretation was clearly written somewhere. Without a written document this interpretation seems apocryphal to me. Earlier in this thread, Guido wrote: "I am not aware of an update to the standard." To the best of my knowledge IEEE Std 754 was last updated in 2008. I don't think the differences between 1985 and 2008 revisions matter much for this discussion, but since I am going to refer to chapter and verse, I will start by citing the document that I will use: IEEE Std 754(TM)-2008 (Revision of IEEE Std 754-1985) IEEE Standard for Floating-Point Arithmetic Approved 12 June 2008 IEEE-SA Standards Board (AFAICT, the main difference between 754-2008 and 754-1985 is that the former includes decimal floats added in 854-1987.) Now, let me put my language lawyer hat on and compare Python floating point implementations to IEEE 754-2008 standard. Here are the relevant clauses: 3. Floating-point formats 4. Attributes and rounding 5. Operations 6. Infinity, NaNs, and sign bit 7. Default exception handling 8. Alternate exception handling attributes 9. Recommended operations 10. Expression evaluation 11. Reproducible floating-point results Clause 3 (Floating-point formats) defines five formats: 3 binary and 2 decimal. Python supports a superset of decimal formats and a single binary format. Section 3.1.2 (Conformance) contains the following provision: "A programming environment conforms to this standard, in a particular radix, by implementing one or more of the basic formats of that radix as both a supported arithmetic format and a supported interchange format." I would say Python is conforming to Clause 3. Clause 4 (Attributes and rounding) is supported only by Decimal through contexts: "For attribute specification, the implementation shall provide language-defined means, such as compiler directives, to specify a constant value for the attribute parameter for all standard operations in a block; the scope of the attribute value is the block with which it is associated." I believe Decimal is mostly conforming, but float is not conforming at all. Clause 5 requires "[a]ll conforming implementations of this standard shall provide the operations listed in this clause for all supported arithmetic formats, except as stated below." In other words, a language standard that claims conformance with IEEE 754 must provide all operations unless the standard states otherwise. Let's try to map IEEE 754 required operations to Python float operations. 5.3.1 General operations sourceFormat roundToIntegralTiesToEven(source) sourceFormat roundToIntegralTiesToAway(source) sourceFormat roundToIntegralTowardZero(source) sourceFormat roundToIntegralTowardPositive(source) sourceFormat roundToIntegralTowardNegative(source) sourceFormat roundToIntegralExact(source) Python only provides float.__trunc__ which implements roundToIntegralTowardZero. (The builtin round() belongs to a different category because it changes format from double to int.) sourceFormat nextUp(source) sourceFormat nextDown(source) I don't think these are available for Python floats. sourceFormat remainder(source, source) - float.__mod__ Not fully conforming. For example, the standard requires remainder(-2.0, 1.0) to return -0.0, but in Python 3.3:
-2.0 % 1.0 0.0
On the other hand,
math.fmod(-2.0, 1.0) -0.0
sourceFormat minNum(source, source) sourceFormat maxNum(source, source) sourceFormat minNumMag(source, source) sourceFormat maxNumMag(source, source) I don't think these are available for Python floats. 5.3.3 logBFormat operations I don't think these are available for Python floats. 5.4.1 Arithmetic operations formatOf-addition(source1, source2) - float.__add__ formatOf-subtraction(source1, source2) - float.__sub__ formatOf-multiplication(source1, source2) - float.__mul__ formatOf-division(source1, source2) - float.__truediv__ formatOf-squareRoot(source1) - math.sqrt formatOf-fusedMultiplyAdd(source1, source2, source3) - missing formatOf-convertFromInt(int) - float.__new__ With exception of fusedMultiplyAdd, Python float is conforming. intFormatOf-convertToIntegerTiesToEven(source) intFormatOf-convertToIntegerTowardZero(source) intFormatOf-convertToIntegerTowardPositive(source) intFormatOf-convertToIntegerTowardNegative(source) intFormatOf-convertToIntegerTiesToAway(source) intFormatOf-convertToIntegerExactTiesToEven(source) intFormatOf-convertToIntegerExactTowardZero(source) intFormatOf-convertToIntegerExactTowardPositive(source) intFormatOf-convertToIntegerExactTowardNegative(source) intFormatOf-convertToIntegerExactTiesToAway(source) Python has a single builtin round(). 5.5.1 Sign bit operations sourceFormat copy(source) - float.__pos__ sourceFormat negate(source) - float.__neg__ sourceFormat abs(source) - float.__abs__ sourceFormat copySign(source, source) - math.copysign Python float is conforming. Now we are getting close to the issue at hand: """ 5.6.1 Comparisons Implementations shall provide the following comparison operations, for all supported floating-point operands of the same radix in arithmetic formats: boolean compareQuietEqual(source1, source2) boolean compareQuietNotEqual(source1, source2) boolean compareSignalingEqual(source1, source2) boolean compareSignalingGreater(source1, source2) boolean compareSignalingGreaterEqual(source1, source2) boolean compareSignalingLess(source1, source2) boolean compareSignalingLessEqual(source1, source2) boolean compareSignalingNotEqual(source1, source2) boolean compareSignalingNotGreater(source1, source2) boolean compareSignalingLessUnordered(source1, source2) boolean compareSignalingNotLess(source1, source2) boolean compareSignalingGreaterUnordered(source1, source2) boolean compareQuietGreater(source1, source2) boolean compareQuietGreaterEqual(source1, source2) boolean compareQuietLess(source1, source2) boolean compareQuietLessEqual(source1, source2) boolean compareQuietUnordered(source1, source2) boolean compareQuietNotGreater(source1, source2) boolean compareQuietLessUnordered(source1, source2) boolean compareQuietNotLess(source1, source2) boolean compareQuietGreaterUnordered(source1, source2) boolean compareQuietOrdered(source1, source2). """ Signaling comparisons are missing. Ordered/Unordered comparisons are missing. Note that the standard does not require any particular spelling for operations. "In this standard, operations are written as named functions; in a specific programming environment they might be represented by operators, or by families of format-specific functions, or by operations or functions whose names might differ from those in this standard." (Sec. 5.1) It would be perfectly conforming for python to spell compareSignalingEqual() as '==' and compareQuietEqual() as math.eq() or even ieee745_2008.compareQuietEqual(). The choice that Python made was not dictated by the standard. (As I have shown above, Python's % operation does not implement a conforming IEEE 754 residual(), but math.fmod() seems to fill the gap.) This post is already too long, so I'll leave Clauses 6-11 for another time. "IEEE 754 may be more complex than you think!" (GvR, earlier in this thread.) I hope I already made the case that Python's float does not conform to IEEE 754 and that IEEE 754 does not require an operation spelled "==" or "float.__eq__" to return False when comparing two NaNs. The standard requires support for 22 comparison operations, but Python's float supports around six. On top of that, Python has an operation that has no analogue in IEEE 754 - the "is" comparison. This is why IEEE 754 standard does not help in answering the main question in this thread: should (x is y) imply (x == y)? We need to formulate a rationale for breaking this implication without a reference to IEEE 754 or Tim's interpretation thereof. Language-lawyierly-yours, Alexander Belopolsky

On Mon, Oct 8, 2012 at 6:31 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Such a rationale exists in my mind. Since floats are immutable, an implementation may or may not intern certain float values (just as certain string and int values are interned but others are not). Therefore, the fact that "x is y" says nothing about whether the computations that produced x and y had anything to do with each other. This is not true for mutable objects: if I have two lists, computed separately, and find they are the same object, the computations that produced them must have communicated somehow, or the same list was passed in to each computations. So, since two computations might return the same object without having followed the same computational path, in another implementation the exact same computation might not return the same object, and so the == comparison should produce the same value in either case -- in particular, if x and y are both NaN, all 6 comparisons on them should return False (given that in general comparing two NaNs returns False regardless of the operator used). The reason for invoking IEEE 754 here is that without it, Python might well have grown a language-wide rule stating that an object should *always* compare equal to itself, as there would have been no significant counterexamples. (As it is, such a rule only exists for containers, and technically even there it is optional -- it is just not required for containers to invoke == for contained items that reference the same object.) -- --Guido van Rossum (python.org/~guido)

On Mon, Oct 8, 2012 at 10:09 PM, Guido van Rossum <guido@python.org> wrote:
This is an interesting argument, but I don't quite understand it. Are you suggesting that some valid Python implementation may inter NaNs? Wouldn't that require that all NaNs are equal?
Therefore, the fact that "x is y" says nothing about whether the computations that produced x and y had anything to do with each other.
True.
True.
True, but this logic does not dictate what this values should be.
Except for operator compareQuietUnordered() which is missing in Python. Note that IEEE 754 also defines totalOrder() operation which is more or less lexicographical ordering of bit patterns. A hypothetical language could map its 6 comparisons to totalOrder() and still claim IEEE 754 conformity as long as it implements the other 22 comparison predicates somehow.
Why would it be a bad thing? Isn't this rule what Bertrand Meyer calls one of the pillars of civilization? It looks like you give a circular argument. Python cannot have a rule that x is y implies x == y because that would preclude implementing float.__eq__ as IEEE 754 equality comparison and we implement float.__eq__ as IEEE 754 equality comparison in order to provide a significant counterexample to x is y implies x == y rule. I am not sure how interning comes into play here, so I must have missed something.

On Mon, Oct 8, 2012 at 11:14 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Sorry, it seems I got this part slightly wrong. Forget interning. The argument goes the other way: If you *do* compute x and y exactly the same way, and if they don't return the same object, and if they both return NaN, the rules for comparing NaN apply, and the values must compare unequal. So if you compute them exactly the same way but somehow you do return the same object, that shouldn't suddenly make them compare equal.
Yes, but that's not the choice Python made, so it's irrelevant. (Unless you now *do* want to change the language, despite stating several times that you were just asking for explanations. :-)
I spent a week with Bertrand recently. He is prone to exaggeration. :-)
No, that's not what I meant -- maybe my turn of phrase "invoking IEEE" was confusing. The first part is what I meant: "Python cannot have a rule that x is y implies x == y because that would preclude implementing float.__eq__ as IEEE 754 equality comparison." The second half should be: "And we have already (independently from all this) decided that we want to implement float.__eq__ as IEEE 754 equality comparison." I'm sure a logician could rearrange the words a bit and make it look more logical. -- --Guido van Rossum (python.org/~guido)

On Tue, Oct 9, 2012 at 12:43 PM, Guido van Rossum <guido@python.org> wrote:
I'll have a go. It's a lot longer, though :) When designing their floating point support, language designers must choose between two mutually exclusive options: 1. IEEE754 compliant floating point comparison where NaN != NaN, *even if* they're the same object 2. The invariant that "x is y" implies "x == y" The idea behind following the IEEE754 model is that mathematics is a *value based system*. There is only really one NaN, just as there is only one 4 (or 5, or any other specific value). The idea of a number having an identity distinct from its value simply doesn't exist. Thus, when modelling mathematics in an object system, it makes sense to say that *object identity is irrelevant, and only value matters*. This is the approach Python has chosen: for *numeric* operations, including comparisons, object identity is irrelevant to the maximum extent that is practical. Thus "x = float('nan'); assert x != x" holds for *exactly the same reason* that "x = 10e50; y = 10e50; assert x == y" holds. However, when it comes to containers, being able to assume that "x is y" implies "x == y" has an immense practical benefit in terms of being able to implement a large number of non-trivial optimisations. Thus the Python language definition explicitly allows containers to make that assumption, *even though it is known not to be universally true*. This hybrid model means that even though "'x is y' implies 'x == y'" is not true in the general case, it may still be *assumed to be true* regardless by container implementations. In particular, the containers defined in the standard library reference are *required* to make this assumption. This does mean that certain invariants about containers don't hold in the presence of NaN values. This is mostly a theoretical concern, but, in those cases where it *does* matter, then the appropriate solution is to implement a custom container type that handles NaN values correctly. It's perhaps worth including a section explaining this somewhere in the language reference. It's not an accident that Python behaves the way it does, but it's certainly a rationale that can help implementors correctly interpret the rest of the language spec. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sun, Oct 7, 2012 at 8:09 PM, Rob Cliffe <rob.cliffe@btinternet.com> wrote:
That's too bad. It sounds like this mailing list really wouldn't have enough space in its margins to convince you otherwise. And yet you are wrong.
Do you have any background at all in *numerical* mathematics?
It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False. I admit that a tutorial section describing the behavior would be good. But I am less than ever convinced that it's possible to explain the *reason* for the behavior in a tutorial. -- --Guido van Rossum (python.org/~guido)

Guido van Rossum wrote:
It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False.
That's the weirdest part about this whole business, I think. Unless you're really keeping your wits about you, it's easy to forget that the assumption (x == y) == False implies (x != y) == True doesn't necessarily hold. This is actually a very important assumption when it comes to reasoning about programs -- even more important than reflexivity, etc, I believe. Consider if x == y: dosomething() else: dosomethingelse() where x and y are known to be floats. It's easy to see that the following is equivalent: if not x == y: dosomethingelse() else: dosomething() but it's not quite so easy to spot that the following is *not* equivalent: if x != y: dosomethingelse() else: dosomething() This trap is made all the easier to fall into because float comparison is *mostly* well-behaved, except for a small subset of the possible values. Most other nonstandard comparison behaviours in Python apply to whole types. E.g. we refuse to compare complex numbers for ordering, even if their values happen to be real, so if you try that you get an early exception. But the weirdness with NaNs only shows up in corner cases that may escape testing. Now, there *is* a third possibility -- we could raise an exception if a comparison involving NaNs is attempted. This would be a more faithful way of adhering to the IEEE 754 specification that NaNs are "unordered". More importantly, it would make the second code transformation above valid in all cases. So the question that really needs to be answered, I think, is not "Why is NaN == NaN false?", but "Why doesn't NaN == anything raise an exception, when it would make so much more sense to do so?" -- Greg

On Mon, Oct 8, 2012 at 5:02 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Because == raising an exception is really unpleasant. We had this in Python 2 for unicode/str comparisons and it was very awkward. Nobody arguing against the status quo seems to care at all about numerical algorithms though. I propose that you go find some numerical mathematicians and ask them. -- --Guido van Rossum (python.org/~guido)

On 9 October 2012 01:11, Guido van Rossum <guido@python.org> wrote:
The main purpose of quiet NaNs is to propagate through computation ruining everything they touch. In a programming language like C that lacks exceptions this is important as it allows you to avoid checking all the time for invalid values, whilst still being able to know if the end result of your computation was ever affected by an invalid numerical operation. The reasons for NaNs to compare unequal are no doubt related to this purpose. It is of course arguable whether the same reasoning applies to a language like Python that has a very good system of exceptions but I agree with Guido that raising an exception on == would be unfortunate. How many people would forget that they needed to catch those exceptions? How awkward could your code be if you did remember to catch all those exceptions? In an exception handling language it's important to know that there are some operations that you can trust. Oscar

On Mon, Oct 8, 2012 at 6:37 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I implemented a floating point context manager for gmpy2 and the MPFR floating point library. By default, it enables a non-stop mode where infinities and NaN are returned but you can also raise exceptions. You can experiment with gmpy2: http://code.google.com/p/gmpy/ Some examples
Standard disclaimers: * I'm the maintainer of gmpy2. * Please use SVN or beta2 (when it is released) to avoid a couple of embarrassing bugs. :(

Oscar Benjamin wrote:
The main purpose of quiet NaNs is to propagate through computation ruining everything they touch.
But they stop doing that as soon as they hit an if statement. It seems to me that the behaviour chosen for NaN comparison could just as easily make things go wrong as make them go right. E.g. while not (error < epsilon): find_a_better_approximation() If error ever ends up being NaN, this will go into an infinite loop. -- Greg

On Tue, Oct 9, 2012 at 7:19 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
But if you know that that's a possibility, you simply code your condition the other way: while error > epsilon: find_a_better_approximation() Which will then immediately terminate the loop if error bonks to NaN. ChrisA

On Oct 9, 2012 9:20 AM, "Greg Ewing" <greg.ewing@canterbury.ac.nz> wrote:
I should expect that an experienced numericist would be aware of the possibility of a NaN and make a trivial modification of your loop to take advantage of the simple fact that any comparison with NaN returns false. It is only because you have artificially placed a not in the while clause that it doesn't work. I would have tested for error>eps without even thinking about NaNs. Oscar

On 09/10/12 11:32, Oscar Benjamin wrote:
Correct, but I'd like to point out that NaNs are a bit more sophisticated than just "numeric contagion". 1) NaNs carry payload, so you can actually identify what sort of calculation failed. E.g. NaN-27 might mean "logarithm of a negative number", while NaN-95 might be "inverse trig function domain error". Any calculation involving a single NaN is supposed to propagate the same payload, so at the end of the calculation you can see that you tried to take the log of a negative number and debug accordingly. 2) On rare occasions, NaNs can validly disappear from a calculation, leaving you with a non-NaN answer. The rule is, if you can replace the NaN with *any* other value, and still get the same result, then the NaN is irrelevant and can be consumed. William Kahan gives an example: For example, 0*NaN must be NaN because 0*∞ is an INVALID operation (NaN). On the other hand, for hypot(x, y) := √(x*x + y*y) we find that hypot(∞, y) = +∞ for all real y, finite or not, and deduce that hypot(∞, NaN) = +∞ too; naive implementations of hypot may do differently. Page 7 of http://www.cs.berkeley.edu/~wkahan/ieee754status/IEEE754.PDF -- Steven

On Mon, Oct 08, 2012 at 09:29:42AM -0700, Guido van Rossum wrote:
It's not about equality. If you ask whether two NaNs are *unequal* the answer is *also* False.
Not so. I think you are conflating NAN equality/inequality with ordering comparisons. Using Python 3.3: py> nan = float('nan') py> nan > 0 False py> nan < 0 False py> nan == 0 False py> nan != 0 True but: py> nan == nan False py> nan != nan True -- Steven

On Tue, Oct 9, 2012 at 7:44 AM, Guido van Rossum <guido@python.org> wrote:
This smells like a bug in the != operator, it seems to fall back to not == which it didn't used to. More later.....
I'm fairly sure it's deliberate, and has been this way in Python for a long time. IEEE 754 also has x != x when x is a NaN (at least, for those IEEE 754 functions that return a boolean rather than signaling an invalid exception), and it's a well documented property of NaNs across languages. -- Mark

On Mon, Oct 08, 2012 at 11:44:12PM -0700, Guido van Rossum wrote:
This smells like a bug in the != operator, it seems to fall back to not == which it didn't used to. More later.....
I'm pretty sure the behaviour is correct. When I get home this evening, I will check my copy of the Standard Apple Numerics manual (one of the first IEEE 754 compliant systems). In the meantime, I quote from "What Every Computer Scientist Should Know About Floating-Point Arithmetic" "Since comparing a NaN to a number with <, ≤, >, ≥, or = (but not ≠) always returns false..." (Admittedly it doesn't specifically state the case of comparing a NAN with a NAN.) http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html -- Steven

On Sun, 07 Oct 2012 22:35:17 -0400 Ned Batchelder <ned@nedbatchelder.com> wrote:
+1 Regards Antoine. -- Software development and contracting: http://pro.pitrou.net

On Sun, Oct 7, 2012 at 7:35 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
I'm sorry. I didn't intend to refuse to document the behavior. I was mostly reacting to things I thought I read between the lines -- the suggestion that there is no reason for the NaN behavior except silly compatibility with an old standard that nobody cares about. From this it is only a small step to reading (again between the lines) the suggesting to change the behavior.
That sounds like a great addition to the docs, except for the nit that I don't like writing the plural of NaN as "NaN's" -- I prefer "NaNs" myself. Also, the words here can still cause confusion. The exact behavior is that every one of the 6 comparison operators (==, !=, <, <=, >, >=) returns False when either argument (or both) is a NaN. I think your suggested words could lead someone to believe that they mean that x != NaN or NaN != Nan would return True. Anyway, once we can agree to words I agree that we should update that section. -- --Guido van Rossum (python.org/~guido)

On 10/8/2012 12:25 PM, Guido van Rossum wrote:
How about: "In accordance with the IEEE 754 standard, when NaNs are compared to any value, even another NaN, the result is always False, regardless of the comparison. This is because NaN represents an unknown result. There is no way to know the relationship between an unknown result and any other result, especially another unknown one. Even comparing a NaN to itself always produces False." --Ned.

Guido van Rossum writes:
Sounds good. (But now maybe we also need to come clean with the exceptions for NaNs compared as part of container comparisons?)
For a second I thought you meant IEEE 754 Exceptions. Whew! How about: """ For reasons of efficiency, Python allows comparisons of containers to shortcut element comparisons. These shortcuts mean that it is possible that comparison of two containers may return True, even if they contain NaNs. For details, see the language reference[1]. """ Longer than I think it deserves, but maybe somebody has a better idea? Footnotes: [1] Sorry about that, but details don't really belong in a *Python* tutorial. Maybe this should be "see the implementation notes"?

Steven D'Aprano wrote:
1) It is not the case that NaN <comp> NaN is always false.
Huh -- well, apparently NaN != Nan --> True. However, borrowing Steven's earlier example, and modifying slightly: sqr(-1) != sqr(-1) Shouldn't this be False? Or, to look at it another way, surely somewhere out in the Real World (tm) it is the case that two NaNs are indeed equal. ~Ethan~

Just a curiosity here (as I can guess of plausible reasons myself, so there probably are some official stances). Is there a reason NaNs are not instances of NaN class? Then x == x would be True (as they want), but [this NaN] == [that NaN] would be False, as expected. I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1), but it seems a lot less of a big deal than all of the exceptions with container equalities. Thanks, Joshua

On 10/10/12 09:13, Joshua Landau wrote:
Because that would complicate Python's using floats for absolutely no benefit. Instead of float operations always returning a float, they would have to return a float or a NAN. To check for a valid floating point instance, instead of saying: isinstance(x, float) you would have to say: isinstance(x, (float, NAN)) And what about infinities, denorm numbers, and negative zero? Do they get dedicated classes too? And what is the point of this added complexity? Nothing. You *still* have the rule that "x == x for all x, except for NANs". The only difference is that "NANs" now means "instances of NAN class" rather than "NAN floats" (and Decimals). Working with IEEE 754 floats is now far more of a nuisance because some valid floating point values aren't floats but have a different class, but nothing meaningful is different.
Then x == x would be True (as they want), but [this NaN] == [that NaN] would be False, as expected.
Making NANs their own class wouldn't give you that. If we wanted that behaviour, we could have it without introducing a NAN class: just change the list __eq__ method to scan the list for a NAN using math.isnan before checking whether the lists were identical. But that would defeat the purpose of the identity check (an optimization to avoid scanning the list)! Replacing math.isnan with isinstance doesn't change that.
I guess that raises the question about why x == x but sqrt(-1) != sqrt(-1),
That question has already been raised, and answered, repeatedly in this thread.
but it seems a lot less of a big deal than all of the exceptions with container equalities.
Container equalities are not a big deal. I'm not sure what problem you think you are solving. -- Steven

On Tue, Oct 9, 2012 at 9:14 PM, Steven D'Aprano <steve@pearwood.info> wrote:
I'm sometimes surprised at the creativity and passion behind solutions to this issue. I've been a Python user for some years now, including time dealing with stuff like numpy where you're fairly likely to run into NaNs. I've been an active member of several support communities where I can confidently say I have encountered tens of thousands of Python questions. Not once can I recall ever having or seeing anyone have an actual problem that I had or someone else had due to the way Python handles NaN. As far as I can tell, it works _perfectly_. I appreciate the aesthetic concerns, but I really wish someone would explain to me what's actually broken and in need of fixing. Mike

On 10/10/12 2:25 AM, Mike Graham wrote:
While I also don't think that anything needs to be fixed, I must say that in my years of monitoring tens of thousands of Python questions, there have been a few legitimate problems with the NaN behavior. It does come up from time to time. The most frequent problem is checking if a list contains a NaN. The obvious thing to do for many users: nan in list_of_floats This is a reasonable prediction based on what one normally does for most objects in Python, but this is quite wrong. But because list.__contains__() checks for identity first, it can look like it works when people test it out:
Then they write their code doing the wrong thing thinking that they tested their approach. I classify this as a wart: it breaks reasonable predictions from users, requires more exceptions-based knowledge about NaNs to use correctly, and can trap users who do try to experiment to determine the behavior. But I think that the cost of acquiring and retaining such knowledge is not so onerous as to justify the cost of any of the attempts to fix the wart. The other NaN wart (unrelated to this thread) is that sorting a list of floats containing a NaN will usually leave the list unsorted because "inequality comparisons with a NaN always return False" breaks the assumptions of timsort and other sorting algorithms. You should remember this, as you once demonstrated the problem: http://mail.python.org/pipermail/python-ideas/2011-April/010063.html This is a real problem, so much so that numpy works around it by enforcing our sorts to always sort NaN at the end of the array. Unfortunately, lists do not have the luxury of cheaply knowing the type of all of the objects in the list, so this is not an option for them. Real problems, but nothing that motivates a change, in my opinion. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco

On 10 October 2012 02:14, Steven D'Aprano <steve@pearwood.info> wrote:
Not the way I'm proposing it.
And what about infinities, denorm numbers, and negative zero? Do they get dedicated classes too?
Infinities? No, although they might well if the infinities were different (set of reals vs set of ints, for example). Denorms? No, that's a completely different thing. -0.0? No, that's a completely different thing. I was asking, because instances of a class maps on to a behavior that matches *almost exactly* what *both* parties want, why was it not used? This is not the case with anything other than that. And what is the point of this added complexity? Nothing.
Simplicity. It's simpler.
You *still* have the rule that "x == x for all x, except for NANs".
False. I was proposing that x == x but NAN() != NAN().
False, if you subclass float.
Then x == x would be True (as they want), but [this NaN] == [that NaN]
False.
as per my previous "implementation".
False. x != x, so that has *not* been "answered". This was an example problem with my own suggested implementation. but it seems a lot less of a big deal than all of the exceptions with
Why would you assume that? I mentioned it from *honest* *curiosity*, and all I got back was an attack. Please, I want to be civil but you need to act less angrily. [Has not been spell-checked, as I don't really have time </lie>] Thank you for your time, even though I disagree, Joshua Landau

On 10 October 2012 22:33, Joshua Landau <joshua.landau.ws@gmail.com> wrote:
After reconsidering, I regret these sentences. Yes, I do still believe your response was overly angry, but I did get a thought out response and you did try and address my concerns. In the interest of benevolence, may I redact my statement?

I don't normally triple-post, but here it goes. After re-re-reading this thread, it turns out one *(1)* post and two *(2)* answers to that post have covered a topic very similar to the one I have raised. All of the others, to my understanding, do not dwell over the fact that *float("nan") is not float("nan")* . The mentioned post was not quite the same as mine, but it still had two replies. I will respond to them here. My response, again, is a curiosity why, *not* a suggestion to change anything. I agree that there is probably no real concern with the current state, I have never had a concern and the concern caused by change would dwarf any possible benefits. Response 1: This implies that you want to differentiate between -0.0 and +0.0. That is bad. My response: Why would I want to do that? Response 2: "There is not space on this thread to convince you otherwise." [paraphrased] My response: That comment was not directed at me and thus has little relevance to my own post. Hopefully now you should understand why I felt need to ask the question after so much has already been said on the topic. Finally, Mike Graham says (probably referring to me): "I'm sometimes surprised at the creativity and passion behind solutions to this issue." My response: It was an immediate thought, not one dwelled upon. The fact it was not answered in the thread prompted my curiosity. It is *honestly* nothing more.

On 11/10/12 09:05, Joshua Landau wrote:
That's no different from any other float. py> float('nan') is float('nan') False py> float('1.5') is float('1.5') False Floats are not interned or cached, although of course interning is implementation dependent and this is subject to change without notice. For that matter, it's true of *nearly all builtins* in Python. The exceptions being bool(obj) which returns one of two fixed instances, and int() and str(), where *some* but not all instances are cached.
If you are doing numeric work, you *should* differentiate between -0.0 and 0.0. That's why the IEEE 754 standard mandates a -0.0. Both -0.0 and 0.0 compare equal, but they can be distinguished (although doing so is tricky in Python). The reason for distinguishing them is to distinguish between underflow to zero from positive or negative values. E.g. log(x) should return -infinity if x underflows from a positive value, and a NaN if x underflows from a negative. -- Steven

On Thu, Oct 11, 2012 at 2:20 AM, Steven D'Aprano <steve@pearwood.info> wrote:
E.g. log(x) should return -infinity if x underflows from a positive value, and a NaN if x underflows from a negative.
IEEE 754 disagrees. :-) Both log(-0.0) and log(0.0) are required to return -infinity (and/or signal the divideByZero exception). And as for sqrt(-0.0) returning -0.0... Grr. I've never understood the motivation for that one, especially as it disagrees with the usual recommendations for complex square root (where the real part of the result *always* has its sign bit cleared). Mark

[Mark Dickinson]
The only rationale I've seen for this is in Kahan's obscure paper "Branch Cuts for Complex Elementary Functions or Much Ado About Nothing's Sign Bit". Hard to find. Here's a mostly readable scan: http://port70.net/~nsz/articles/float/kahan_branch_cuts_complex_elementary_f... In part it's to preserve various identities, such as that sqrt(conjugate(z)) is the same as conjugate(sqrt(z)). When z is +0, that becomes sqrt(conjugate(+0)) same_as conjugate(sqrt(+0)) which is sqrt(-0) same_as conjugate(+0) which is sqrt(-0) same as -0 Conviced? LOL. There are others in the paper ;-)

On Fri, Oct 12, 2012 at 8:42 PM, Tim Peters <tim.peters@gmail.com> wrote:
Not really. :-) In fact, it's exactly that paper that made me think sqrt(-0.0) -> -0.0 is suspect. The way I read it, the argument from the paper implies that cmath.sqrt(complex(0.0, -0.0)) should be complex(0.0, -0.0), which I have no problem with---it makes things nice and neat: quadrants 1 and 2 in the complex plane map to quadrant 1, and quadrants 3 and 4 to quadrant 4, with the signs of the zeros making it clear what 'quadrant' means in all (non-nan) cases. But I don't see how to get from there to math.sqrt(-0.0) being -0.0. It's exactly the mismatch between the real and complex math that makes no sense to me: math.sqrt(-0.0) should resemble cmath.sqrt(complex(-0.0, +/-0.0)). But the latter, quite reasonably, is complex(0.0, +/-0.0) (at least according to both Kahan and C99 Annex G), while the former is specified to be -0.0 in IEEE 754. -- Mark

On 11 October 2012 02:20, Steven D'Aprano <steve@pearwood.info> wrote:
Confusing re-use of identity strikes again. Can anyone care to explain what causes this? I understand float(1.5) is likely to return the inputted float, but that's as far as I can reason. What I was saying, though, is that all other posts assumed equality between two different NaNs should be the same as identity between a NaN and itself. This is what I'm really asking about, I guess.
Interesting. Can you give me a more explicit example? When would you not *want* f(-0.0) to always return the result of f(0.0)? [aka, for -0.0 to warp into 0.0 on creation]

On 2012-10-12 19:42, Joshua Landau wrote:
It re-uses an immutable literal:
and 'float' returns its argument if it's already a float:
float(1.5) is 1.5 True
Therefore:
float(1.5) is float(1.5) True
But apart from that, when a new object is created, it doesn't check whether it's identical to another, except in certain cases such as ints in a limited range:
And it's an implementation-specific behaviour.

On Fri, Oct 12, 2012 at 7:42 PM, Joshua Landau <joshua.landau.ws@gmail.com> wrote:
A few examples: (1) In the absence of exceptions, 1 / 0.0 is +inf, while 1 / -0.0 is -inf. So e.g. the function exp(-exp(1/x)) has different values at -0.0 and 0.0:
(2) For the atan2 function, we have e.g.,
This gives atan2 a couple of nice invariants: the sign of the result always matches the sign of the first argument, and atan2(-y, x) == -atan2(y, x) for any (non-nan) x and y. (3) Similarly, for complex math functions (which aren't covered by IEEE 754, but are standardised in various other languages), it's sometimes convenient to be able to depend on invariants like e.g. asin(z.conj()) == asin(z).conj(). Those are only possible if -0.0 and 0.0 are distinguished; the effect is most visible if you pick values lying on a branch cut.
You can't take that too far, though: e.g., it would be nice if complex multiplication had the property that (z * w).conjugate() was always the same as z.conjugate() * w.conjugate(), but it's impossible to keep both that invariant and the commutativity of multiplication. (E.g., consider the result of complex(1, 1) * complex(1, -1).) -- Mark

Thank you all for being so thorough. I think I'm sated for tonight. ^^ With all due respect, Joshua Landau

Ethan Furman writes:
Or, to look at it another way, surely somewhere out in the Real World (tm) it is the case that two NaNs are indeed equal.
Sure, but according to Kahan's Uncertainty principle, you'll never be able to detect it. Really-there's no-alternative-to-backward-compatibility-or-IEEE754-ly y'rs

On Mon, Oct 8, 2012 at 9:39 PM, Ned Batchelder <ned@nedbatchelder.com> wrote:
Looks fine, but I'd suggest leaving out the philosophy ('there is no way to know ...') and sticking to the statement that Python follows the IEEE 754 standard in this respect. The justification isn't particularly convincing and (IMO) only serves to invite arguments. -- Mark

On Sun, Oct 07, 2012 at 10:35:17PM -0400, Ned Batchelder wrote:
NANs don't quite mean "unknown result". If they did they would probably be called "MISSING" or "UNKNOWN" or "NA" (Not Available). NANs represent a calculation result which is Not A Number. Hence the name :-) Since we're talking about the mathematical domain here, a numeric calculation that doesn't return a numeric result could be said to have no result at all: there is no real-valued x for which x**2 == -1, hence sqrt(-1) can return a NAN. It certainly doesn't mean "well, there is an answer, but I don't know what it is". It means "I know that there is no answer". Since neither sqrt(-1) nor sqrt(-2) exist in the reals, we cannot say that they are equal. If we did, we could prove anything: sqrt(-1) = sqrt(-2) Square both sides: -1 = -2 I was not on the IEEE committee, so I can't speak for them, but my guess is that they reasoned that since there are an infinite number of "no result" not-a-number calculations, but only a finite number of NAN bit patterns available to be used for them, it isn't even safe to presume that two NANs with the same bit pattern are equal since they may have come from completely different calculations. Of course this was before object identity was a relevant factor. As I've stated before, I think that having collections choose to optimize away equality tests using object identity is fine. If I need a tuple that honours NAN semantics, I can subclass tuple to get one. I shouldn't expect the default tuple behaviour to carry that cost. By the way, NANs are awesome and don't get anywhere near enough respect. Here's a great idea from the D language: http://www.drdobbs.com/cpp/nans-just-dont-get-no-respect/240005723 -- Steven

On Tue, Oct 9, 2012 at 12:16 AM, Steven D'Aprano <steve@pearwood.info> wrote:
This is quite true, but in Python "Not A Number" is spelled None. In many aspects, None is like signaling NaN - any numerical operation on it results in a type error, but None == None is True. ..
This is a typical mathematical fallacy where a progression of seemingly equivalent equations contains an invalid operation. See http://en.wikipedia.org/wiki/Mathematical_fallacy#All_numbers_equal_all_othe... This is not an argument to make nan == nan false. The IEEE 754 argument goes as follows: in the domain of 2**64 bit patterns most patterns represent real numbers, some represent infinities and some do not represent either infinities or numbers. Boolean comparison operations are defined on the entire domain, but <, =, or > outcomes are not exclusive if NaNs are present. The forth outcome is "unordered." In other words for any two patterns x and y one and only one of the following is true: x < y or x = y or x > y or x and y are unordered. If x is NaN, it compares as unordered to any other pattern including itself. This explains why compareQuietEqual(x, x) is false when x is NaN. In this case, x is unordered with itself, unordered is different from equal, so compareQuietEqual(x, x) cannot be true. It cannot raise an exception either because it has to be quiet. Thus the only correct result is to return false. The problem that we have in Python is that float.__eq__ is used for too many different things and compareQuietEqual is not always appropriate. Here is a partial list: 1. x == y 2. x in [y] 3. {y:1}[x] 4. x in {y} 5. [y].index(x) In python 3, we already took a step away from using the same notion of equality in all these cases. Thus in #2, we use x is y or x == y instead of plain x == y. But that leads to some strange results:
An alternative would be to define x in l as any(isnan(x) and isnan(y) or x == y for y in l) when x and all elements of l are floats. Again, I am not making a change proposal - just mention a possibility.

On Sun, Oct 7, 2012 at 8:35 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Java implements IEEE 754 to some extent, but preserves reflexivity of object equality.
I don't actually know Java, but if I run class HelloNaN { public static void main(String[] args) { double nan1 = 0.0 / 0.0; double nan2 = 0.0 / 0.0; System.out.println(nan1 == nan2); } } I get the output "false". Mike
participants (25)
-
alex23
-
Alexander Belopolsky
-
Antoine Pitrou
-
Case Van Horsen
-
Chris Angelico
-
Ethan Furman
-
Greg Ewing
-
Guido van Rossum
-
Joshua Landau
-
Mark Dickinson
-
Mathias Panzenböck
-
Max Moroz
-
Mike Graham
-
MRAB
-
Ned Batchelder
-
Nick Coghlan
-
Oscar Benjamin
-
Rob Cliffe
-
Robert Kern
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Sven Marnach
-
Terry Reedy
-
Tim Peters
-
Victor Stinner