[issue11945] Adopt and document consistent semantics for handling NaN values in containers
New submission from Nick Coghlan <ncoghlan@gmail.com>: The question of the way Python handles NaN came up again on python-dev recently. The current semantics have been assessed as a reasonable compromise, but a poorly explained and inconsistently implemented one. Based on a suggestion from Terry Reedy [1] I propose that a new glossary entry be added for "Reflexive Equality": "Part of the standard mathematical definition of equality is that it is reflexive, that is ``x is y`` necessarily implies that ``x == y``. This is an essential property that is relied upon when designing and implementing container classes such as ``list`` and ``dict``. However, the IEEE754 committee defined the float Not_a_Number (NaN) values as being unequal with all others floats, including themselves. While this design choice violates the basic mathematical definition of equality, it is still considered desirable to be able to correctly implement IEEE754 floating point semantics, and those of similar types such as ``decimal.Decimal``, directly in Python. Accordingly, Python makes the follow compromise in order to cope with types that use non-reflexive definitions of equality without breaking the invariants of container classes that rely on reflexive definitions of equality: 1. Direct equality comparisons involving ``NaN``, such as ``nan=float('NaN'); nan == nan``, follow the IEEE754 rule and return False (or True in the case of ``!=``). This rule applies to ``float`` and ``decimal.Decimal`` within the builtins and standard library. 2. Indirect comparisons conducted internally by container classes, such as ``x in someset`` or ``seq.count(x)`` or ``somedict[x]``, enforce reflexivity by using the expressions ``x is y or x == y`` and ``x is not y and x != y`` respectively rather than assuming that ``x == y`` and ``x != y`` will always respect the reflexivity requirement. This rule applies to all container types within the builtins and standard library that may contain values of arbitrary types. Also see [1] for a more comprehensive theoretical discussion of this topic. [1] http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civiliz..." Specific container methods that have currently been identified as relying on the reflexivity assumption are: - __contains__() (for x in c: assert x in c) - __eq__() (assert [x] == [x]) - __ne__() (assert not [x] != [x]) - index() (for x in c: assert 0 <= c.index(x) < len(c)) - count() (for x in c: assert c.count(x) > 0) collections.Sequence and array.array (with the 'f' or 'd' type indicators) have already been identified as container classes in the standard library that fails to follow the second guideline and hence fail to correctly implement the above invariants in the presence of non-reflexive definitions of equality. They will be fixed as part of implementing this patch. Other container types that fail to correctly enforce reflexivity can be fixed as they are identified. [1] http://mail.python.org/pipermail/python-dev/2011-April/110962.html ---------- assignee: docs@python components: Documentation, Library (Lib) messages: 134639 nosy: docs@python, ncoghlan priority: normal severity: normal status: open title: Adopt and document consistent semantics for handling NaN values in containers type: behavior versions: Python 2.7, Python 3.2, Python 3.3 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Nick Coghlan <ncoghlan@gmail.com> added the comment: Actually, based on the NumPy precedent [1], array.array should be fine as is. Since it uses raw C floats and doubles internally, rather than Python objects, there is no clear concept of "object identity" to use to enforce reflexivity. [1] http://mail.python.org/pipermail/python-dev/2011-April/110987.html ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Glenn Linderman <v+python@g.nevcal.com> added the comment: Bertrand Meyer's exposition is flowery, and he is a learned man, but the basic argument he makes is: Reflexivity of equality is something that we expect for any data type, and it seems hard to justify that a value is not equal to itself. As to assignment, what good can it be if it does not make the target equal to the source value? The argument is flawed: now that NaN exists, and is not equal to itself in value, there should be, and need be, no expectation that assignment elsewhere should make the target equal to the source in value. It can, and in Python, should, make them match in identity (is) but not in value (==, equality). I laud the idea of adding to definition of reflexive equality to the glossary. However, I think it is presently a bug that a list containing a NaN value compares equal to itself. Yes, such a list should have the same identity (is), but should not be equal. ---------- nosy: +v+python _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment:
I think it is presently a bug that a list containing a NaN value compares equal to itself.
Moreover, it also compares equal to another list containing the same NaN:
[nan] is [nan] False [nan] == [nan] True
Here is another case of is implies == optimization breaking NaN property in stdlib:
import ctypes x = ctypes.c_double(nan) x == x True
---------- nosy: +belopolsky _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Nick Coghlan <ncoghlan@gmail.com> added the comment: The status quo works. Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: On Thu, Apr 28, 2011 at 3:01 AM, Nick Coghlan <report@bugs.python.org> wrote: ..
The status quo works.
No it does not. I am yet to see a Python program that uses non-reflexivity of NaN in a meaningful way. What I've seen was either programmers ignore it and write slightly buggy programs ("slightly" because it is actually hard to produce a NaN in Python code) or they add extra code to filter out NaN values before numbers are compared.
Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly.
Documenting the status quo is necessary for any proposal to change. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Changes by Raymond Hettinger <raymond.hettinger@gmail.com>: ---------- assignee: docs@python -> rhettinger nosy: +rhettinger _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Nick Coghlan <ncoghlan@gmail.com> added the comment: By "works" I merely meant that you can currently achieve both of the following: 1. Write fully conformant implementations of IEEE754 floating point types, including the non-reflexive NaN comparisons (keeping in mind that, as a value-based specification, "same payload" is the closest IEEE754 can get to "same object") 2. Explicitly force reflexivity when you need it, either by filtering out nonconformant values, or by checking identity before checking equality. The "pure" equality-tests-are-always-reflexive approach advocated by Meyer rules out option 1. Given that one of the use cases for Python is to prototype algorithms that are later translated to C or C++, formally disallowing that use case would be problematic. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Alexander Belopolsky <belopolsky@users.sourceforge.net> added the comment: On Thu, Apr 28, 2011 at 3:26 AM, Nick Coghlan <report@bugs.python.org> wrote: ..
1. Write fully conformant implementations of IEEE754 floating point types, including the non-reflexive NaN comparisons (keeping in mind that, as a value-based specification, "same payload" is the closest IEEE754 can get to "same object")
If being "fully conformant" with various IEEE standards was a design goal for Python, we would have leap seconds in the datetime module. :-) Python builtin float equality being reflexive does not in any way inhibits anyone's ability to *write* a fully conforming implementation. In fact, if we ever get arithmetic operations implemented for ctypes types, I would argue that c_double comparison of c_double values would need to be changed to match C behavior. (I am +0 on changing that even without implementing arithmetics.) I realize, however that by "status quo" you mean container operations not calling __eq__ on identical objects. I agree that this should not change. Making float comparison reflexive will actually make this feature less controversial. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Glenn Linderman <v+python@g.nevcal.com> added the comment: Nick says (and later explains better what he meant): The status quo works. Proposals to change it on theoretical grounds have a significantly higher bar to meet than proposals to simply document it clearly. I say: What the status quo doesn't provide is containers that "work". In this case what I mean by "work" is that equality of containers is based on value, and value comparisons, and accept and embrace non-reflexive equality. It might be possible to implement alternate containers with these characteristics, but that requires significantly more effort than simply filtering values. Nonetheless, I totally agree with msg134654, and agree that properly documenting the present implementation would be a great service to users of the present implementation. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Changes by Mark Dickinson <dickinsm@gmail.com>: ---------- nosy: +mark.dickinson _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Terry J. Reedy <tjreedy@udel.edu> added the comment: To repeat concisely what I said on pydev list, I think Reference 5.9. Comparisons, which says "Tuples and lists are compared lexicographically using comparison of corresponding elements. This means that to compare equal, each element must compare equal and the two sequences must be of the same type and have the same length.". needs 'be indentical or ' added before 'compare equal and ...' "Mappings (dictionaries) compare equal if and only if they have the same (key, value) pairs." may be ok, depending on how one interprets 'same (key, value) pairs'. Alexander has opened a separate issue to change behavior in 3.3. ---------- nosy: +terry.reedy _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Nick Coghlan <ncoghlan@gmail.com> added the comment: After further discussion on python-dev, it is clear that this identity checking behaviour handles more than just NaNs - it also allows containers to cope more gracefully with objects like NumPy arrays that make use of rich comparisons to return something other than simple True/False values for equality checks. Also, since I neglected to mention it in the initial post, merely *adding* the glossary entry is just the first step. It then needs to be referenced from the appropriate points in the language and library reference. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Nick Coghlan <ncoghlan@gmail.com> added the comment: Scratch the first half of that last comment - Guido pointed out that false positives rear their ugly head almost immediately if you try to store rich comparison objects in other containers. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Changes by Daniel Urban <urban.dani+py@gmail.com>: ---------- nosy: +durban _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Changes by Terry J. Reedy <tjreedy@udel.edu>: ---------- versions: +Python 3.4 -Python 3.2 _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Raymond Hettinger added the comment: I think this should be closed. AFAICT it is of interest to a very tiny subset of the human species and as near as I can tell that subset doesn't include people in the numeric and statistics community (the ones who actually use NaNs as placeholders for missing values). So much code (and human reasoning) assumes that identity-implies-equality, that is would be easier to document the exception to expectation than to try to find every place in every module where the assumption is present (implicitly or explicitly). Instead, it would be better to document that distinct float('NaN') objects are never equal to one another and that identical float('NaN') objects may or may not compare equal in various implementation dependent circumstances. ---------- _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Changes by Andy Maier <andreas.r.maier@gmx.de>: ---------- nosy: +andymaier _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
Raymond Hettinger added the comment: Closing for the reasons listed and also because there is another pair of tracker items 22000 and 22001 pursuing related documentation updates. ---------- resolution: -> not a bug status: open -> closed _______________________________________ Python tracker <report@bugs.python.org> <http://bugs.python.org/issue11945> _______________________________________
participants (8)
-
Alexander Belopolsky
-
Andy Maier
-
Daniel Urban
-
Glenn Linderman
-
Mark Dickinson
-
Nick Coghlan
-
Raymond Hettinger
-
Terry J. Reedy