decimal.py: == and != comparisons involving NaNs
Hi, in a (misguided) bugreport (http://bugs.python.org/issue7279) I was questioning the reasons for allowing NaN comparisons with == and != rather than raising InvalidOperation. I think two main issues emerge from the brief discussion: 1. Should the comparison operators follow the 'compare' function from the standard? The standard says: "An implementation may use this operation [compare] under the covers to implement a closed set of comparison operations (greater than, equal, etc.) if desired. It need not, in this case, expose the compare operation itself." So, I'd say that this supports following 'compare' as closely as possible. In fact the operators <, <=, >, >= already follow 'compare-signal' in their behavior (they raise for any NaN operand). 2. What is the use of == and != outside the decimal scope? Mark mentions that Python uses == to test for set and dict memberships, but that you cannot put decimal NaNs into sets: 'TypeError: Cannot hash a NaN value' I want to add that Decimal('NaN') == Decimal('NaN') gives False, which should somewhat limit the uses of == for NaNs outside the decimal realm anyway. Are there cases where == and != are actually needed to give a result for NaNs? Stefan Krah
[Stefan Krah]
in a (misguided) bugreport (http://bugs.python.org/issue7279) I was questioning the reasons for allowing NaN comparisons with == and != rather than raising InvalidOperation.
Do you have any actual use case issues or are these theoretical musings? I ask only because a good use case might suggest the best way to adapt the standard to the regular python api for equality/inequality operators. NaNs are odd ducks. They are unique in violating our basic notions of equality (any relation that is reflexsive, symmetric, and transitive). Once you use them in a context that goes beyond the decimal spec, it is no surprise that you run into difficulties where NaNs don't fit very well (because they violate basic assumptions and invariants in other code).
Are there cases where == and != are actually needed to give a result for NaNs?
I would say that anywhere someone needs the full behaviors specified by the standard, they need to use the actual compare() method which allows for a decimal context to be specified and allows for more than just a true/false return value (i.e. a NaN is a possible result). Raymond
Raymond Hettinger <python@rcn.com> wrote:
[Stefan Krah]
in a (misguided) bugreport (http://bugs.python.org/issue7279) I was questioning the reasons for allowing NaN comparisons with == and != rather than raising InvalidOperation.
Do you have any actual use case issues or are these theoretical musings? I ask only because a good use case might suggest the best way to adapt the standard to the regular python api for equality/inequality operators.
I think my reasoning goes the opposite way: The current behavior (raising InvalidOperation) of <, <=, >=, > is sensible and as close to the standard as one can get. This behavior was not chosen for the equality/inequality operators because they _might_ be used for other purposes. But since Decimal("NaN") == Decimal("NaN") gives False, these non-decimal use cases don't work:
d = {0:Decimal("NaN")} Decimal("NaN") in d.values() False
So, since non-decimal use cases are limited at best, the equality/inequality operators might as well have the behavior of the other comparison operators, which is safer for the user. I can also give a decimal use case where the current behavior is problematic A variable initialized to a signaling NaN should always cause an exception. But this doesn't: salary = Decimal("sNaN") minimum_wage = 1000 if (salary == minimum_wage): print "do stuff" else: print "do other stuff" Stefan Krah
Stefan Krah <stefan-usenet <at> bytereef.org> writes:
d = {0:Decimal("NaN")} Decimal("NaN") in d.values() False
So, since non-decimal use cases are limited at best, the equality/inequality operators might as well have the behavior of the other comparison operators, which is safer for the user.
The problem is when searching for /another/ object which hashes the same as Decimal("NaN"). Here is a made-up situation to show you the problem:
class H(object): ... def __hash__(self): return hash(1) ... def __eq__(self, other): raise ValueError ... h = H() d = {h: ""} d[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in __eq__ ValueError d[1] = 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in __eq__ ValueError
Antoine Pitrou <solipsis@pitrou.net> wrote:
Stefan Krah <stefan-usenet <at> bytereef.org> writes:
d = {0:Decimal("NaN")} Decimal("NaN") in d.values() False
So, since non-decimal use cases are limited at best, the equality/inequality operators might as well have the behavior of the other comparison operators, which is safer for the user.
The problem is when searching for /another/ object which hashes the same as Decimal("NaN"). Here is a made-up situation to show you the problem:
class H(object): ... def __hash__(self): return hash(1) ... def __eq__(self, other): raise ValueError ... h = H() d = {h: ""} d[1] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in __eq__ ValueError d[1] = 2 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 3, in __eq__ ValueError
I see the point, but Decimal("NaN") does not hash:
hash(Decimal("NaN")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/decimal.py", line 937, in __hash__ raise TypeError('Cannot hash a NaN value.') TypeError: Cannot hash a NaN value.
Also, NaNs cause problems in non-decimal comparisons virtually everywhere:
L = [1, 2, Decimal("NaN")] L.sort() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/decimal.py", line 877, in __lt__ ans = self._compare_check_nans(other, context) File "/usr/lib/python2.7/decimal.py", line 782, in _compare_check_nans self) File "/usr/lib/python2.7/decimal.py", line 3755, in _raise_error raise error(explanation) decimal.InvalidOperation: comparison involving NaN
I think problems like these would best be avoided by having a separate __totalorder__ or __lexorder__ method instead of using __eq__, __lt__, etc., but this is of course outside the scope of this discussion (and I have no idea how difficult it would be to implement that). Stefan Krah
Stefan Krah <stefan-usenet <at> bytereef.org> writes:
I see the point, but Decimal("NaN") does not hash:
Ok but witness again:
L = [1, 2, Decimal("NaN"), 3] 3 in L True class H(object): ... def __eq__(self, other): raise ValueError ... L = [1, 2, H(), 3] 3 in L Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in __eq__ ValueError
(NB: interestingly, float("nan") does hash) Regards Antoine.
Antoine Pitrou <solipsis@pitrou.net> wrote:
I see the point, but Decimal("NaN") does not hash:
Ok but witness again:
L = [1, 2, Decimal("NaN"), 3] 3 in L True class H(object): ... def __eq__(self, other): raise ValueError ... L = [1, 2, H(), 3] 3 in L Traceback (most recent call last): File "<stdin>", line 1, in <module> File "<stdin>", line 2, in __eq__ ValueError
Yes, but the list is already broken in two ways:
L.sort() Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/decimal.py", line 877, in __lt__ ans = self._compare_check_nans(other, context) File "/usr/lib/python2.7/decimal.py", line 782, in _compare_check_nans self) File "/usr/lib/python2.7/decimal.py", line 3755, in _raise_error raise error(explanation) decimal.InvalidOperation: comparison involving NaN
Decimal("NaN") in L False
(NB: interestingly, float("nan") does hash)
I wonder if it should:
d = {float('nan'): 10, 0: 20} 0 in d True float('nan') in d False d[float('nan')] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: nan
I guess my point is that NaNs in lists and dicts are broken in so many ways that it might be good to discourage this use. (And get the added benefit of safer mathematical behavior for == and !=.) Stefan Krah
On Mon, Nov 9, 2009 at 1:21 PM, Stefan Krah <stefan-usenet@bytereef.org> wrote:
Antoine Pitrou <solipsis@pitrou.net> wrote:
(NB: interestingly, float("nan") does hash)
I wonder if it should:
d = {float('nan'): 10, 0: 20} 0 in d True float('nan') in d False d[float('nan')] Traceback (most recent call last): File "<stdin>", line 1, in <module> KeyError: nan
That's because you're creating two different float nans. Compare with: Python 3.2a0 (py3k:76132M, Nov 6 2009, 14:47:39) [GCC 4.2.1 (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
nan = float('nan') d = {nan: 10, 0: 20} nan in d True d[nan] 10
Mark
Mark Dickinson wrote:
That's because you're creating two different float nans. Compare with:
Python 3.2a0 (py3k:76132M, Nov 6 2009, 14:47:39) [GCC 4.2.1 (SUSE Linux)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
nan = float('nan') d = {nan: 10, 0: 20} nan in d True d[nan] 10
This also suggests to me that nan should be a singleton, or at least that the doc should recommend that programs should make it be such for the program. tjr
On Mon, Nov 9, 2009 at 2:51 PM, Terry Reedy <tjreedy@udel.edu> wrote:
This also suggests to me that nan should be a singleton, or at least that the doc should recommend that programs should make it be such for the program.
The IEEE std disagreed -- there's extra info hidden in the mantissa bits. And the Python float implementation makes it pretty impractical to do this at the application level since x+y will generate a new NaN-valued float object each time it is called (if the outcome is NaN). -- --Guido van Rossum (python.org/~guido)
Stefan Krah <stefan-usenet <at> bytereef.org> writes:
I guess my point is that NaNs in lists and dicts are broken in so many ways that it might be good to discourage this use. (And get the added benefit of safer mathematical behavior for == and !=.)
Giving users seemingly random and unexplainable exceptions would not be a good way to discourage it, though.
On Mon, Nov 9, 2009 at 12:21 PM, Stefan Krah <stefan-usenet@bytereef.org> wrote:
I see the point, but Decimal("NaN") does not hash:
hash(Decimal("NaN")) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/decimal.py", line 937, in __hash__ raise TypeError('Cannot hash a NaN value.') TypeError: Cannot hash a NaN value.
I think that may represent an excess of caution. float nans do hash quite happily, and I can't see a good reason for preventing Decimal nans from having a hash. Mark
On Mon, Nov 9, 2009 at 10:42 AM, Stefan Krah <stefan-usenet@bytereef.org> wrote:
I can also give a decimal use case where the current behavior is problematic A variable initialized to a signaling NaN should always cause an exception.
But this doesn't:
salary = Decimal("sNaN") minimum_wage = 1000 if (salary == minimum_wage): print "do stuff" else: print "do other stuff"
Hmm. This does look suspicious. It's possible we should be raising for signalling nans here. For most of what I wrote above I was thinking of quiet nans. Mark
Stefan Krah <stefan-usenet <at> bytereef.org> writes:
Are there cases where == and != are actually needed to give a result for NaNs?
It is a common expectation that == and != always succeed. They return True or False, but don't raise an exception even on unrelated operands:
b"a" == "a" False "5" == 5 False {} == 0.0 False None == (lambda x: 1) False int == max False
The only place I know of where this expectation isn't met is when comparing "naive" and "timezone-aware" datetime objects, which raises a TypeError (IIRC).
Antoine Pitrou wrote:
Stefan Krah <stefan-usenet <at> bytereef.org> writes:
Are there cases where == and != are actually needed to give a result for NaNs?
It is a common expectation that == and != always succeed. They return True or False, but don't raise an exception even on unrelated operands:
It is a common expectation, but a false one. __eq__ and __ne__ are explicitly allowed to return anything, not just bools. http://www.python.org/dev/peps/pep-0207/ -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Sun, Nov 8, 2009 at 4:26 PM, Stefan Krah <stefan-usenet@bytereef.org> wrote:
Hi,
in a (misguided) bugreport (http://bugs.python.org/issue7279) I was questioning the reasons for allowing NaN comparisons with == and != rather than raising InvalidOperation.
Some quick recent history: For reference, the current behaviour dates from r60630. Before this, comparisons involving nans behaved even less coherently. See http://bugs.python.org/issue1979 for details. Apart from Python's use of __eq__, the other motivation for the current behaviour comes from the IEEE 854 standard; given the absence of helpful information in the Decimal standard, IEEE 854 is an obvious next place to look. There's an unofficial copy of the standard available at: http://754r.ucbtest.org/standards/854.pdf Section 5.7 describes twenty-six(!) distinct comparison operators. So it's not immediately clear which of those twenty-six comparison operators each of Python's six comparison operators should map to. However, in the standard, the first six operations in the table are somewhat distinguished: they're the ones that are marked as corresponding to the usual mathematical operations, and to Fortran's usual comparison operators (.EQ., etc.). Given this, and given that this behaviour seemed to fit well with Python's needs for __eq__, it seemed to make sense at the time to map Python's six operators to the first 6 operators in table 3.
I think two main issues emerge from the brief discussion:
1. Should the comparison operators follow the 'compare' function from the standard?
That's a possibility. But see below.
2. What is the use of == and != outside the decimal scope?
Mark mentions that Python uses == to test for set and dict memberships, but that you cannot put decimal NaNs into sets:
'TypeError: Cannot hash a NaN value'
I want to add that Decimal('NaN') == Decimal('NaN') gives False, which should somewhat limit the uses of == for NaNs outside the decimal realm anyway.
Are there cases where == and != are actually needed to give a result for NaNs?
Well, when running in some form of 'non-stop' mode, where (quiet) NaN results are supposed to be propagated to the end of a computation, you certainly want equality comparisons with nan just to silently return false. E.g., in code like: if x == 0: <deal with zero special case> else: <usual algorithm> nans should just end up in the second branch, without the programmer having had to think about it too hard. So I think comparisons with nans should always return either True or False when InvalidOperation is not trapped. The question is whether comparisons should always signal when InvalidOperation is trapped (which is what happens with the default context). I'm -0.5 on changing the current behaviour: it may not be exactly right, and if I were implementing Decimal from scratch I might well do things differently, but I don't think it's terribly wrong either. While not based on the Decimal standard itself, it's based on the next most closely- related standard. It works with Python's needs for __eq__ and __ne__. And it's already out there in Python 2.6; making minute adjustments to existing behaviour without a good reason seems like asking for trouble. Mark
On Mon, Nov 9, 2009 at 1:01 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
current behaviour comes from the IEEE 854 standard; given the absence of helpful information in the Decimal standard, IEEE 854 is an obvious next place to look. There's an unofficial copy of the standard available at:
http://754r.ucbtest.org/standards/854.pdf
Section 5.7 describes twenty-six(!) distinct comparison operators.
It's interesting to note that out of the 32 possible different comparison behaviours (two choices of result for each of {equal, lessthan, greaterthan, unordered}, together with a choice of whether to signal or not for unordered in each case), the two interesting operators that are missing from that IEEE 854 table are precisely the ones that we're discussing: signalling __eq__ (i.e., return False for lessthan, greaterthan, unordered, True for equal, and signal on unordered), and signalling __ne__ (the reverse of the above, but still signalling on unordered). (The other four missing operators are the uninteresting ones that always return True or always return False.) Mark
On Mon, Nov 9, 2009 at 06:01, Mark Dickinson <dickinsm@gmail.com> wrote:
Well, when running in some form of 'non-stop' mode, where (quiet) NaN results are supposed to be propagated to the end of a computation, you certainly want equality comparisons with nan just to silently return false. E.g., in code like:
if x == 0: <deal with zero special case> else: <usual algorithm>
nans should just end up in the second branch, without the programmer having had to think about it too hard.
if x != 0: <usual algorithm> else: <deal with zero special case> nans should just end up in the first branch, without the programmer having had to think about it too hard. There is a more consistent alternative: have all comparisons involving NaN also return NaN, signifying they're unordered. Let bool coercion raise the exception. Thus, both examples would raise an exception, but a programmer who wants to handle NaN could do so explicitly: temp = x == 0 if temp.isnan() or temp: <usual algorithm> else: <deal with zero special case> IEEE 754 is intended for a very different context. I don't think it makes sense to attempt literal conformance to it. -- Adam Olsen, aka Rhamphoryncus
On Fri, Nov 13, 2009 at 6:18 PM, Adam Olsen <rhamph@gmail.com> wrote:
On Mon, Nov 9, 2009 at 06:01, Mark Dickinson <dickinsm@gmail.com> wrote:
Well, when running in some form of 'non-stop' mode, where (quiet) NaN results are supposed to be propagated to the end of a computation, you certainly want equality comparisons with nan just to silently return false. E.g., in code like:
if x == 0: <deal with zero special case> else: <usual algorithm>
nans should just end up in the second branch, without the programmer having had to think about it too hard.
if x != 0: <usual algorithm> else: <deal with zero special case>
nans should just end up in the first branch, without the programmer having had to think about it too hard.
And they do: nan != 0 returns False. Maybe I'm missing your point here?
IEEE 754 is intended for a very different context. I don't think it makes sense to attempt literal conformance to it.
I disagree. The decimal specification is tied closely to IEEE 754 (the 2008 version), to the extent that even minor changes made by the 754r working group were mirrored in the decimal specification as it evolved; it only stopped evolving after IEEE 754-2008 was complete. IEEE 754-2008 also makes a point of targeting languages, rather than just floating-point hardware; to me it seems very much applicable to decimal.py. Mark
On Fri, Nov 13, 2009 at 14:52, Mark Dickinson <dickinsm@gmail.com> wrote:
On Fri, Nov 13, 2009 at 9:50 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
And they do: nan != 0 returns False. Maybe I'm missing your point here?
Aargh! True! I meant to say True!
Huh. Somewhere along the line I lost track of how python handled NaN. I thought "comparisons always evaluate to false" was the rule. -- Adam Olsen, aka Rhamphoryncus
participants (9)
-
Adam Olsen -
Antoine Pitrou -
Greg Ewing -
Guido van Rossum -
Mark Dickinson -
Raymond Hettinger -
Robert Kern -
Stefan Krah -
Terry Reedy