Mixing float and Decimal -- thread reboot

I'd like to reboot this thread. I've been spinning this topic in my head for most of the morning, and I think we should seriously reconsider allowing mixed arithmetic involving Decimal, not just mixed comparisons. [Quick summary: embed Decimal in the numeric tower but add a context flag to disallow implicit mixing of float and Decimal.] I tried to find the argumentation against it in PEP 327 (Decimal Data Type) and found that it didn't make much of an argument against mixed arithmetic beyond "it's not needed" and "it's not urgent". (It even states that initially Decimal.from_float() was omitted for simplicity -- but it got added later.) We now also have PEP 3141 (A Type Hierarchy for Numbers) which proposes a numeric tower. It has an explicit exclusion for Decimal, but that exclusion is provisional: "After consultation with its authors it has been decided that the ``Decimal`` type should not at this time be made part of the numeric tower." That was a compromise because at the time some contributors to Decimal were fiercely opposed to including Decimal into the numeric tower, and I didn't want to have a endless discussion at the time (there were many more pressing issues to be resolved). However now the subject is coming up again, and my gut keeps telling me that Decimal ought to be properly embedded in Python's numeric tower. Decimal is already *touching* the numeric tower by allowing mixed arithmetic with ints. This causes the anomaly that Mark mentioned earlier: the three values 1, 1.0 and Decimal(1) do not satisfy the rule "if x == y and y == z then it follows that x == z". We have 1.0 == 1 == Decimal(1) but 1 == 1.0 != Decimal(1). This also causes problems with hashing, where {Decimal(1), 1, 1.0} != {Decimal(1), 1.0, 1}. I'd like to look at the issue by comparing the benefits and drawbacks of properly embedding Decimal into the numeric tower. As advantages, I see consistent behavior in situations like the above and more intuitive behavior for beginners. Also, this would be a possible road towards eventually supporting a language extension where floating point literals produce Decimal values instead of binary floats. (A possible syntax could be "from __options__ import decimal_float", which would work similar to "from __future__ import ..." except it's a permanent part of the language rather than a forward compatibility feature.) As a downside, there is the worry that inadvertent mixing of Decimal and float can compromise the correctness of programs in a way that is hard to detect. But the anomalies above indicate that not fixing the situation can *also* compromise correctness in a similar way. Maybe a way out would be to add a new flag to the decimal Context class indicating whether to disallow mixing Decimal and float -- that way programs that care about this can force the issue, while the default behavior can be more intuitive. Possibly the flag should not affect comparisons. There is one choice which I'm not sure about. Should a mixed float/Decimal operation return a float or a Decimal? I note that Fraction (which *is* properly embedded in the numeric tower) supports this and returns a float result in this case. While I earlier proposed to return the most "complicated" type of the two, i.e. Decimal, I now think it may also make sense to return a float, being the most "fuzzy" type in the numeric tower. This would also make checking for accidental floats easier, since floats now propagate throughout the computation (like NaN) and a simple assertion that the result is a Decimal instance suffices to check that no floats were implicitly mixed into the computation. The implementation of __hash__ will be complicated, and it may make sense to tweak the hash function of float, Fraction and Decimal to make it easier to ensure that for values that can be represented in either type the hash matches the equality. But this sounds a worthwhile price to pay for proper embedding in the numeric tower. -- --Guido van Rossum (python.org/~guido)

On Mar 19, 2010, at 2:50 PM, Guido van Rossum wrote:
I'd like to reboot this thread. I've been spinning this topic in my head for most of the morning, and I think we should seriously reconsider allowing mixed arithmetic involving Decimal, not just mixed comparisons. [Quick summary: embed Decimal in the numeric tower but add a context flag to disallow implicit mixing of float and Decimal.]
Making decimals first class citizens would sure help eliminate some special cases and anomalies. If a context flag were added, am wondering whether it should simply provide a warning rather than flat-out disallowing the transaction. The whole point is to highlight accidental mixing. If the mixed arithmetic were allowed, then the decimal constructor should be changed to match (i.e. accept float inputs instead of requiring Decimal.from_float()). Raymond

Just a couple of quick side comments on this; I haven't got my head around the whole mixed-operations idea yet. On Fri, Mar 19, 2010 at 9:50 PM, Guido van Rossum <guido@python.org> wrote:
There is one choice which I'm not sure about. Should a mixed float/Decimal operation return a float or a Decimal?
I'll just say that it's much easier to return a Decimal if you want to be able to make guarantees about rounding behaviour, basically because floats can be converted losslessly to Decimals. I also like the fact that the decimal module offers more control (rounding mode, precision, flags, wider exponent range) than float. In general, the correct semantics for an arithmetic operation are to produce a result that's equivalent to what would have been obtained by performing the operation to infinite precision and then doing a single round to fit that result into the output type format. This is trivially easy to do if mixed float and Decimal operations produce Decimal, and much much harder if they produce floats; if mixed-type operations produced floats we'd probably have to go with algorithms that involve two rounds (i.e., first coerce the Decimal to a float, then do the operation as usual on the two floats), and there would likely be some (small) numeric surprises as a result.
The implementation of __hash__ will be complicated, and it may make sense to tweak the hash function of float, Fraction and Decimal to make it easier to ensure that for values that can be represented in either type the hash matches the equality. But this sounds a worthwhile price to pay for proper embedding in the numeric tower.
I don't think this is going to be a problem. I've implemented most of the scheme I outlined earlier (it's working for ints, floats and Decimals; still need to implement it for Fractions and complex numbers) and it seems to work just fine, with essentially no more code than was there before. I'll post a proof-of-concept patch when I've filled in the missing bits. Mark

Mark Dickinson <dickinsm <at> gmail.com> writes:
On Fri, Mar 19, 2010 at 9:50 PM, Guido van Rossum <guido <at> python.org>
wrote:
There is one choice which I'm not sure about. Should a mixed float/Decimal operation return a float or a Decimal?
I'll just say that it's much easier to return a Decimal if you want to be able to make guarantees about rounding behaviour, basically because floats can be converted losslessly to Decimals. I also like the fact that the decimal module offers more control (rounding mode, precision, flags, wider exponent range) than float.
A problem, though, is that decimals are much slower than floats. If you have a decimal creeping in some part of a calculation it could degrade performance quite a bit. Regards Antoine.

On Sat, Mar 20, 2010 at 09:11, Antoine Pitrou <solipsis@pitrou.net> wrote:
Mark Dickinson <dickinsm <at> gmail.com> writes:
On Fri, Mar 19, 2010 at 9:50 PM, Guido van Rossum <guido <at> python.org>
wrote:
There is one choice which I'm not sure about. Should a mixed float/Decimal operation return a float or a Decimal?
I'll just say that it's much easier to return a Decimal if you want to be able to make guarantees about rounding behaviour, basically because floats can be converted losslessly to Decimals. I also like the fact that the decimal module offers more control (rounding mode, precision, flags, wider exponent range) than float.
A problem, though, is that decimals are much slower than floats. If you have a decimal creeping in some part of a calculation it could degrade performance quite a bit.
For a little context, we have this numeric tower: int -> Fraction -> float -> complex And coincidentally (or not), we have an unspoken rule that you can go right, but never left (int/int -> float, int/Fraction -> Fraction, Fraction/float -> float, Fraction/complex -> complex, etc). This gives us a preference for fast, inexact results. Decimal is more precise, and pays a performance cost for it. It also seems odd to stick it between float and complex (nobody's planning a ComplexDecimal, right?) That suggests it should go between Fraction and float. Decimal/float -> float. -- Adam Olsen, aka Rhamphoryncus

Adam Olsen wrote:
For a little context, we have this numeric tower:
int -> Fraction -> float -> complex
Decimal is more precise, and pays a performance cost for it. It also seems odd to stick it between float and complex (nobody's planning a ComplexDecimal, right?) That suggests it should go between Fraction and float. Decimal/float -> float.
There are two ways in which that linear tower is overly simplistic: * It conflates the notions of exactness and width. They're really orthogonal concepts, and to reflect this you would need two parallel towers, with exact and inexact versions of each type. * Decimal and float really belong side-by-side in the tower, rather than one above the other. Neither of them is inherently any more precise or exact than the other. There doesn't seem to be any good solution here. For every use case in which Decimal+float->float appears better, there seems to be another one for which Decimal+float->Decimal appears better. -- Greg

On Sat, Mar 20, 2010 at 4:20 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
Adam Olsen wrote:
For a little context, we have this numeric tower:
int -> Fraction -> float -> complex
Decimal is more precise, and pays a performance cost for it. It also seems odd to stick it between float and complex (nobody's planning a ComplexDecimal, right?) That suggests it should go between Fraction and float. Decimal/float -> float.
There are two ways in which that linear tower is overly simplistic:
* It conflates the notions of exactness and width. They're really orthogonal concepts, and to reflect this you would need two parallel towers, with exact and inexact versions of each type.
It's representing the mathematical concepts Integral -> Rational -> Real -> Complex When designing it, I tried to include a notion of exact/inexact types, but we couldn't find anything practical to do with them, so we took them out. It's reasonably easy to design inexact Integral and Rational types, but pretty hard to design a useful, exact Real type (things like '==' get uncomputable quickly), so we probably couldn't actually implement two whole parallel towers.
* Decimal and float really belong side-by-side in the tower, rather than one above the other.
Yep.

On Mar 20, 2010, at 4:27 PM, Jeffrey Yasskin wrote:
When designing it, I tried to include a notion of exact/inexact types, but we couldn't find anything practical to do with them, so we took them out.
The were also other reasons that they were taken out. The notion of inexactness is a taint, not a property of a type. The design documents for the decimal spec aimed for the ability to do either inexact or exact calculations (and to encompass integer and fixed-point arithmetic). That is in-part why decimal is suitable for accounting work. A person can forbid any inexactness by setting a context flag that would raise an exception if any inexact calculation occurs. Another design principle for decimal is the notion that numbers are always exact, it is the results of operations that are subject to rounding. That is why the decimal constructor is not context sensitive. So, as Jeffrey says, the notion of exactness and inexactness is not built in to the numeric tower. Instead, it is about which operations or methods can be expected to apply to a given type (i.e. both float and Decimal support the necessary methods to register as a Real). In terms of interoperability of concrete types, we're in the fortunate position that any float can be exactly converted to a Decimal and any Decimal can be exactly converted to a Fraction. Raymond

On Sat, Mar 20, 2010 at 11:20 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
* Decimal and float really belong side-by-side in the tower, rather than one above the other. Neither of them is inherently any more precise or exact than the other.
Except that float is fixed-width (typically 53 bits of precision), while Decimal allows a user-specified, arbitrarily large, precision; so in that sense the two floating-point types aren't on an equal footing. Mark

On Sat, Mar 20, 2010 at 4:46 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Sat, Mar 20, 2010 at 11:20 PM, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
* Decimal and float really belong side-by-side in the tower, rather than one above the other. Neither of them is inherently any more precise or exact than the other.
Except that float is fixed-width (typically 53 bits of precision), while Decimal allows a user-specified, arbitrarily large, precision; so in that sense the two floating-point types aren't on an equal footing.
But this doesn't really help deciding which should be positioned at the end does it? Both Fraction and Decimal can represent every float value (barring NaN/Inf). I wonder if we need to look at use cases or see what other languages do for guidance. -- --Guido van Rossum (python.org/~guido)

Mark Dickinson wrote:
Except that float is fixed-width (typically 53 bits of precision), while Decimal allows a user-specified, arbitrarily large, precision;
Yes, but it still has *some* fixed limit at any given moment, so the result of an operation on Decimals always has the potential to produce an inexact result. It's not like an int or Fraction where the result can expand to whatever size is needed. -- Greg

On Mar 20, 2010, at 9:40 PM, Greg Ewing wrote:
Mark Dickinson wrote:
Except that float is fixed-width (typically 53 bits of precision), while Decimal allows a user-specified, arbitrarily large, precision;
Yes, but it still has *some* fixed limit at any given moment, so the result of an operation on Decimals always has the potential to produce an inexact result. It's not like an int or Fraction where the result can expand to whatever size is needed.
I'm thinking that I need to do more work on the Decimal documentation. There still seems to be a profound misunderstanding of its capabilities (i.e. that an Inexact flag can be set to preclude any inexact operations, that the precision can be automatically extended as needed during a calculation, or that many types of calculations can be done exactly especially if floor division is used). If rounded is needed, it can be controlled explicitly. Raymond

On Sat, Mar 20, 2010 at 17:20, Greg Ewing <greg.ewing@canterbury.ac.nz> wrote:
There are two ways in which that linear tower is overly simplistic:
* It conflates the notions of exactness and width. They're really orthogonal concepts, and to reflect this you would need two parallel towers, with exact and inexact versions of each type.
* Decimal and float really belong side-by-side in the tower, rather than one above the other. Neither of them is inherently any more precise or exact than the other.
There doesn't seem to be any good solution here. For every use case in which Decimal+float->float appears better, there seems to be another one for which Decimal+float->Decimal appears better.
Sure, from a purist point of view my post is completely wrong. It doesn't correspond to the mathematical reality. What it does correspond to is the code. Only going rightward through the types is what we have today. A linear progression is a lot simpler to understand than any sort of cycle; parallel progressions isn't even on the table. float has been the king of inexact types right for a long time. All other things being equal, that's good enough for me. -- Adam Olsen, aka Rhamphoryncus

On Fri, Mar 19, 2010 at 5:50 PM, Guido van Rossum <guido@python.org> wrote:
As a downside, there is the worry that inadvertent mixing of Decimal and float can compromise the correctness of programs in a way that is hard to detect. But the anomalies above indicate that not fixing the
Decimal already has something that we can use in this case, and fits very nice here: Signals. Signals represent conditions that arise during computation. Each corresponds to one context flag and one context trap enabler. So, if we add a signal like "MixedWithFloats", users will have a flag in the context that they could check to see if a float was mixed in the operations executed (and if the user set the trap accordingly, an exception will be raised when the signal happens). OTOH, returning a float the first time both are mixed is easy to check... but if it has downsides, and we prefer to return a Decimal in that case, note that we have a mechanism in Decimal we can use. Furthermore, in case we want to ease the transition we can do the following: - add this signal - set *by default* the trap to raise an exception when float and Decimal is mixed So, the behaviour will be the same as we have now, but users can easily change it. Regards, -- . Facundo Blog: http://www.taniquetil.com.ar/plog/ PyAr: http://www.python.org/ar/

Facundo Batista <facundobatista@gmail.com> wrote:
On Fri, Mar 19, 2010 at 5:50 PM, Guido van Rossum <guido@python.org> wrote:
As a downside, there is the worry that inadvertent mixing of Decimal and float can compromise the correctness of programs in a way that is hard to detect. But the anomalies above indicate that not fixing the
Decimal already has something that we can use in this case, and fits very nice here: Signals.
I like the simplicity of having a single signal (e.g. CoercionError), but a strictness context flag could offer greater control for people who only want pure decimal/integer operations. For example: strictness 0: completely promiscuous behaviour strictness 1: current py3k behaviour strictness 2: current py3k behaviour + pure equality comparisons strictness 3: current py3k behaviour + pure equality comparisons + disallow NaN equality comparisons [1] Just as an illustration, here is a quick and dirty diff using the DefaultContext for simplicity: Index: Lib/decimal.py =================================================================== --- Lib/decimal.py (revision 78352) +++ Lib/decimal.py (working copy) @@ -3765,8 +3765,8 @@ def __init__(self, prec=None, rounding=None, traps=None, flags=None, Emin=None, Emax=None, - capitals=None, _clamp=0, - _ignored_flags=None): + capitals=None, strictness=1, + _clamp=0, _ignored_flags=None): if flags is None: flags = [] if _ignored_flags is None: @@ -5785,7 +5785,9 @@ return other if isinstance(other, int): return Decimal(other) - if raiseit: + if isinstance(other, float) and DefaultContext.strictness == 0: + return Decimal.from_float(other) + if raiseit or DefaultContext.strictness > 1: raise TypeError("Unable to convert %s to Decimal" % other) return NotImplemented @@ -5800,7 +5802,8 @@ flags=[], Emax=999999999, Emin=-999999999, - capitals=1 + capitals=1, + strictness=1 ) Stefan Krah [1] See: http://mail.python.org/pipermail/python-dev/2009-November/093910.html, http://mail.python.org/pipermail/python-dev/2009-November/093952.html

On Tue, Mar 23, 2010 at 12:09 PM, Stefan Krah <stefan@bytereef.org> wrote:
Facundo Batista <facundobatista@gmail.com> wrote:
On Fri, Mar 19, 2010 at 5:50 PM, Guido van Rossum <guido@python.org> wrote:
As a downside, there is the worry that inadvertent mixing of Decimal and float can compromise the correctness of programs in a way that is hard to detect. But the anomalies above indicate that not fixing the
Decimal already has something that we can use in this case, and fits very nice here: Signals.
I like the simplicity of having a single signal (e.g. CoercionError), but a strictness context flag could offer greater control for people who only want pure decimal/integer operations.
Sounds worth considering.
For example:
strictness 0: completely promiscuous behaviour
strictness 1: current py3k behaviour
strictness 2: current py3k behaviour + pure equality comparisons
Can you explain what you mean by "+ pure equality comparisons" here? If I'm understanding correctly, this is a mode that's *more* strict than the current py3k behaviour; what's it disallowing that the current py3k behaviour allows?
strictness 3: current py3k behaviour + pure equality comparisons + disallow NaN equality comparisons [1]
Sorry, no. I think there are good reasons for the current NaN equality behaviour: 2.0 really *isn't* a NaN, and Decimal(2) == Decimal('nan') should return False rather than raising an exception. And the decimal module provides compare and compare_signal for those who want complete standards-backed control here. Besides, this seems to me to be an orthogonal issue to the issue of mixing Decimal with other numeric types. Mark

Mark Dickinson <dickinsm@gmail.com> wrote:
I like the simplicity of having a single signal (e.g. CoercionError), but a strictness context flag could offer greater control for people who only want pure decimal/integer operations.
Sounds worth considering.
For example:
strictness 0: completely promiscuous behaviour
strictness 1: current py3k behaviour
strictness 2: current py3k behaviour + pure equality comparisons
Can you explain what you mean by "+ pure equality comparisons" here? If I'm understanding correctly, this is a mode that's *more* strict than the current py3k behaviour; what's it disallowing that the current py3k behaviour allows?
It's disallowing all comparisons between e.g. float and decimal. The idea is that the context can provide a cheap way of enforcing types for people who like it:
from decimal import * DefaultContext.strictness = 1 Decimal(9) == 9.0 False Decimal(9) in [1, 4.0 ,9] True
DefaultContext.strictness = 2 Decimal(9) == 9.0 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/stefan/svn/py3k/Lib/decimal.py", line 858, in __eq__ other = _convert_other(other) File "/home/stefan/svn/py3k/Lib/decimal.py", line 5791, in _convert_other raise TypeError("Unable to convert %s to Decimal" % other) TypeError: Unable to convert 9.0 to Decimal Decimal(9) in [1, 4.0 ,9] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/stefan/svn/py3k/Lib/decimal.py", line 858, in __eq__ other = _convert_other(other) File "/home/stefan/svn/py3k/Lib/decimal.py", line 5791, in _convert_other raise TypeError("Unable to convert %s to Decimal" % other) TypeError: Unable to convert 4.0 to Decimal Decimal(9) in [1, 4 ,9] True
This mode could help catch bugs like: n = 7 / 3 # Programmer thinks this is an integer x = Decimal(100) while x != n: pass # do something x -= 1
strictness 3: current py3k behaviour + pure equality comparisons + disallow NaN equality comparisons [1]
Sorry, no. I think there are good reasons for the current NaN equality behaviour: 2.0 really *isn't* a NaN, and Decimal(2) == Decimal('nan') should return False rather than raising an exception. And the decimal module provides compare and compare_signal for those who want complete standards-backed control here.
I'd like to make it an option for people who don't want to write: while x.compare_signal(7) != 0 And I think that an sNaN should really signal by default.
Besides, this seems to me to be an orthogonal issue to the issue of mixing Decimal with other numeric types.
Yes, it would kind of overload the strictness parameter. I see it as another type of strictness, so I brought it up here. Level 3 would be a bit like the highest warning level of a compiler. But of course there's no need to discuss NaNs further in this thread other than to show a possible use of the flag. Stefan Krah

On Tue, Mar 23, 2010 at 3:09 PM, Stefan Krah <stefan@bytereef.org> wrote:
Mark Dickinson <dickinsm@gmail.com> wrote:
[Stefan]
strictness 2: current py3k behaviour + pure equality comparisons
Can you explain what you mean by "+ pure equality comparisons" here? If I'm understanding correctly, this is a mode that's *more* strict than the current py3k behaviour; what's it disallowing that the current py3k behaviour allows?
It's disallowing all comparisons between e.g. float and decimal. The idea is that the context can provide a cheap way of enforcing types for people who like it:
DefaultContext.strictness = 2 Decimal(9) == 9.0 Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/stefan/svn/py3k/Lib/decimal.py", line 858, in __eq__ other = _convert_other(other) File "/home/stefan/svn/py3k/Lib/decimal.py", line 5791, in _convert_other raise TypeError("Unable to convert %s to Decimal" % other) TypeError: Unable to convert 9.0 to Decimal
Hmm. It seems to me that deliberately making an __eq__ method between hashable types raise an exception isn't something that should be done lightly, since it can *really* screw up sets and dicts. For example, with your proposal, {9.0, Decimal(x)} would either raise or not, depending on whether Decimal(x) happened to hash equal to 9.0 (if they don't hash equal, then __eq__ will never be called). If the hash is regarded as essentially a black box (which is what it should be for most users) then you can easily end up with code that almost always works, but *very* occasionally and unpredicatably raises an exception.
And I think that an sNaN should really signal by default.
Agreed, notwithstanding the above comments. Though to avoid the problems described above, I think the only way to make this acceptable would be to prevent hashing of signaling nans. (Which the decimal module current does; it also prevents hashing of quiet NaNs, but I can't see any good rationale for that.) Mark

On Tue, Mar 23, 2010 at 11:31, Mark Dickinson <dickinsm@gmail.com> wrote:
Agreed, notwithstanding the above comments. Though to avoid the problems described above, I think the only way to make this acceptable would be to prevent hashing of signaling nans. (Which the decimal module current does; it also prevents hashing of quiet NaNs, but I can't see any good rationale for that.)
a = Decimal('nan') a != a They don't follow the behaviour required for being hashable. float NaN should stop being hashable as well. -- Adam Olsen, aka Rhamphoryncus

On Tue, Mar 23, 2010 at 5:48 PM, Adam Olsen <rhamph@gmail.com> wrote:
a = Decimal('nan') a != a
They don't follow the behaviour required for being hashable.
What's this required behaviour? The only rule I'm aware of is that if a == b then hash(a) == hash(b). That's not violated here. Note that containment tests check identity before equality, so there's no problem with putting (float) nans in sets or dicts:
x = float('nan') s = {x} x in s True
Mark

On Tue, Mar 23, 2010 at 12:04, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Mar 23, 2010 at 5:48 PM, Adam Olsen <rhamph@gmail.com> wrote:
a = Decimal('nan') a != a
They don't follow the behaviour required for being hashable.
What's this required behaviour? The only rule I'm aware of is that if a == b then hash(a) == hash(b). That's not violated here.
Note that containment tests check identity before equality, so there's no problem with putting (float) nans in sets or dicts:
x = float('nan') s = {x} x in s True
Ergh, I thought that got changed. Nevermind then. -- Adam Olsen, aka Rhamphoryncus

On Tue, Mar 23, 2010 at 6:07 PM, Adam Olsen <rhamph@gmail.com> wrote:
On Tue, Mar 23, 2010 at 12:04, Mark Dickinson <dickinsm@gmail.com> wrote:
Note that containment tests check identity before equality, so there's no problem with putting (float) nans in sets or dicts:
x = float('nan') s = {x} x in s True
Ergh, I thought that got changed. Nevermind then.
Hmm. I think you're right: it did get changed at some point early in py3k's history; I seem to recall that the identity-checking behaviour got restored before 3.1 was released, though. There was an issue about this somewhere, but I'm failing to find it. Mark

On Tue, Mar 23, 2010 at 10:55 AM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Tue, Mar 23, 2010 at 6:07 PM, Adam Olsen <rhamph@gmail.com> wrote:
On Tue, Mar 23, 2010 at 12:04, Mark Dickinson <dickinsm@gmail.com> wrote:
Note that containment tests check identity before equality, so there's no problem with putting (float) nans in sets or dicts:
x = float('nan') s = {x} x in s True
Ergh, I thought that got changed. Nevermind then.
Hmm. I think you're right: it did get changed at some point early in py3k's history; I seem to recall that the identity-checking behaviour got restored before 3.1 was released, though. There was an issue about this somewhere, but I'm failing to find it.
Raymond and I don't see this the same way. It looks like he won. :-) -- --Guido van Rossum (python.org/~guido)

On Wed, 24 Mar 2010 05:04:37 am Mark Dickinson wrote:
On Tue, Mar 23, 2010 at 5:48 PM, Adam Olsen <rhamph@gmail.com> wrote:
a = Decimal('nan') a != a
They don't follow the behaviour required for being hashable.
What's this required behaviour? The only rule I'm aware of is that if a == b then hash(a) == hash(b). That's not violated here.
Note that containment tests check identity before equality, so there's no problem with putting (float) nans in sets or dicts:
x = float('nan') s = {x} x in s True
As usual though, NANs are unintuitive:
d = {float('nan'): 1} d[float('nan')] = 2 d {nan: 1, nan: 2}
I suspect that's a feature, not a bug. -- Steven D'Aprano

Steven D'Aprano writes:
As usual though, NANs are unintuitive:
d = {float('nan'): 1} d[float('nan')] = 2 d {nan: 1, nan: 2}
I suspect that's a feature, not a bug.
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?

On Wed, Mar 24, 2010 at 5:36 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Steven D'Aprano writes:
> As usual though, NANs are unintuitive: > > >>> d = {float('nan'): 1} > >>> d[float('nan')] = 2 > >>> d > {nan: 1, nan: 2} > > > I suspect that's a feature, not a bug.
Right: distinct nans (i.e., those with different id()) are treated as distinct set elements or dict keys.
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?
So what alternative behaviour would you suggest, and how would you implement it? I agree that many aspects of the current treatment of nans aren't ideal, but I as far as I can see that's unavoidable. For sane containment testing, Python's == operator needs to give an equivalence relation. Meanwhile IEEE 754 requires that nans compare unequal to themselves, breaking reflexivity. So there have to be some compromises somewhere. The current compromise at least has the virtue that it doesn't require special-casing nans anywhere in the general containment-testing and hashing machinery. One alternative would be to prohibit putting nans into sets and dicts by making them unhashable; I'm not sure what that would gain, though. And there would still be some unintuitive behaviour for containment testing of nans in lists. Mark

On Wed, 24 Mar 2010 08:51:36 pm Mark Dickinson wrote:
On Wed, Mar 24, 2010 at 5:36 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Steven D'Aprano writes:
> As usual though, NANs are unintuitive: > > >>> d = {float('nan'): 1} > >>> d[float('nan')] = 2 > >>> d > {nan: 1, nan: 2} > > > I suspect that's a feature, not a bug.
Right: distinct nans (i.e., those with different id()) are treated as distinct set elements or dict keys.
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?
Not necessarily -- you could merely ignore any key which is a NaN, or you could pass each key through this first: def intern_nan(x, nan=float('nan')): if math.isnan(x): return nan return x thus ensuring that all NaN keys were the same NaN.
So what alternative behaviour would you suggest, and how would you implement it? [...] One alternative would be to prohibit putting nans into sets and dicts by making them unhashable; I'm not sure what that would gain, though. And there would still be some unintuitive behaviour for containment testing of nans in lists.
I think that would be worse than the current situation. That would mean that dict[some_float] would *nearly always* succeed, but occasionally would fail. I can't see that being a good thing. -- Steven D'Aprano

Steven D'Aprano wrote:
On Wed, 24 Mar 2010 08:51:36 pm Mark Dickinson wrote:
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?
Not necessarily -- you could merely ignore any key which is a NaN, or you could pass each key through this first:
def intern_nan(x, nan=float('nan')): if math.isnan(x): return nan return x
thus ensuring that all NaN keys were the same NaN.
Interning NaN certainly seems like it should be sufficient to eliminate the set/dict membership weirdness. That is, make it so that the first two lines of the following return True, while the latter two lines continue to return False:
float("nan") is float("nan") False dec("nan") is dec("nan") False float("nan") == float("nan") False dec("nan") == dec("nan") False
Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Wed, 24 Mar 2010 10:47:26 pm Nick Coghlan wrote:
Steven D'Aprano wrote:
On Wed, 24 Mar 2010 08:51:36 pm Mark Dickinson wrote:
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?
Not necessarily -- you could merely ignore any key which is a NaN, or you could pass each key through this first:
def intern_nan(x, nan=float('nan')): if math.isnan(x): return nan return x
thus ensuring that all NaN keys were the same NaN.
Interning NaN certainly seems like it should be sufficient to eliminate the set/dict membership weirdness.
I didn't mean to suggest that Python should do that automatically! I meant that the developer could easily intern NaNs if needed. I wouldn't want Python to automatically intern NaNs, the reason being that this would throw away information (at least potentially, depending on the C library). According to the relevant IEEE standard, NaNs should (may?) carry a payload. For example, Apple's SANE math library back in the 1980s exposed this payload: NaNs created from different failures would have a consistent payload, allowing the programmer to tell how the NaN appeared in the calculation. E.g. INF-INF would give you a payload of 123 (or whatever it was), while log(-1) would give you a payload of 456. (I've made up the numbers, it's been far too many years for me to remember what they were.) The point is, whether Python currently exposes these payloads or not, we shouldn't prohibit it. If programmers want to explicitly fold all NaNs into one, it is easy to do so themselves. -- Steven D'Aprano

On Wed, Mar 24, 2010 at 11:47 AM, Nick Coghlan
Interning NaN certainly seems like it should be sufficient to eliminate the set/dict membership weirdness.
That is, make it so that the first two lines of the following return True, while the latter two lines continue to return False:
float("nan") is float("nan") False dec("nan") is dec("nan") False float("nan") == float("nan") False dec("nan") == dec("nan") False
Yes; that could be done. Though as Steven points out, not all NaNs are equivalent (possibility of different payloads and different signs), so float nans with different underlying bit patterns, and Decimal nans with different string representations, would ideally be interned separately. For floats it might be possible to get away with pretending that there's only one nan. For decimal, I don't think that's true, since the payload and sign are part of the standard, and are very visible (e.g. in the repr of the nan). The obvious way to do this nan interning for floats would be to put the interning code into PyFloat_FromDouble. I'm not sure whether this would be worth the cost in terms of added code (and possibly reduced performance, since the nan check would be done every time a float was returned), but I'd be willing to review a patch. Mark

On Mar 24, 2010, at 9:22 AM, Mark Dickinson wrote:
The obvious way to do this nan interning for floats would be to put the interning code into PyFloat_FromDouble. I'm not sure whether this would be worth the cost in terms of added code (and possibly reduced performance, since the nan check would be done every time a float was returned), but I'd be willing to review a patch.
-1 Propagating support for NaNs has already consumed an enormous amount of your time. The code for the math module doubled in complexity as a result of adding Nan/Inf support. At each stage, some new weirdness emerges (such as complex(Nan, Nan), etc). And each change introduces the risk of new bugs in code that had been stable for a long time. IIRC, the original purpose of a NaN was to serve as a placeholder value in a long series of floating point ops so that the programmer would not have to introduce edge case tests at each stage of a calculation. Yet, I look at the code for the decimal module and the C code for the math module and see that the opposite result occurred, the code is littered with is_special(x) tests and handlers. In terms of practicality, NaNs work great as a way to mark missing values and to propagate through subsequent calculations. IMO, their least useful feature is the property of being not equal to themselves -- that causes more problems than it solves because it impairs a programmer's ability to reason about their programs. Raymond

On Wed, Mar 24, 2010 at 1:39 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote: ..
IMO, their least useful feature is the property of being not equal to themselves -- that causes more problems than it solves because it impairs a programmer's ability to reason about their programs.
I agree. An often cited rationale for IEEE 754 behavior is that it eliminates branching in some high performance numerical algorithms. While this is likely to be true, I have never seen it benefiting real life applications, certainly not those written in Python. I wonder why Python did not follow Java model where Float NaN objects unlike raw float NaNs compare equal to themselves. One reason may be that Python does not have raw floats, but if someone needs IEEE 754 NaNs, one can use numpy scalars or add arithmetics to ctypes numerical types. Mark, I wonder if you could describe an algorithm off the top of your head that relies on NaN == NaN being false.

On Wed, Mar 24, 2010 at 11:26 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
I wonder why Python did not follow Java model where Float NaN objects unlike raw float NaNs compare equal to themselves. One reason may be that Python does not have raw floats, but if someone needs IEEE 754 NaNs, one can use numpy scalars or add arithmetics to ctypes numerical types.
Probably because we were blindly following the IEEE standard without understanding it in every detail. -- --Guido van Rossum (python.org/~guido)

On Wed, Mar 24, 2010 at 2:36 PM, Guido van Rossum <guido@python.org> wrote: ..
Probably because we were blindly following the IEEE standard without understanding it in every detail.
Are you talking about "accidental" support for NaNs in older versions of Python or about recent efforts to support them properly in math and decimal modules? I feel you are too harsh on the developers that worked in this area. I dare to suggest that the current situation is not due to lack of understanding of the standard, but due to pragmatic decisions made in early development and desire for backward compatibility in the recent versions. Is this an area where design changes are feasible? IIRC, NaN support was never "officially" in the language, but it may have changed with the decimal module.

On Wed, Mar 24, 2010 at 11:55 AM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
On Wed, Mar 24, 2010 at 2:36 PM, Guido van Rossum <guido@python.org> wrote: ..
Probably because we were blindly following the IEEE standard without understanding it in every detail.
Are you talking about "accidental" support for NaNs in older versions of Python or about recent efforts to support them properly in math and decimal modules?
My recollection include recent efforts, such as the math.isnan() function. I don't recall ever hearing an argument for the peculiar behavior of NaN in comparisons beyond "this is what the IEEE standard prescribes."
I feel you are too harsh on the developers that worked in this area.
Maybe. I didn't mean to put down individuals complicit in the decision -- in fact I would say that I am to blame myself for not probing deeper.
I dare to suggest that the current situation is not due to lack of understanding of the standard, but due to pragmatic decisions made in early development and desire for backward compatibility in the recent versions.
I think that originally NaN (and Inf) behavior was so platform-dependent that it really wouldn't have mattered if we had changed it.
Is this an area where design changes are feasible? IIRC, NaN support was never "officially" in the language, but it may have changed with the decimal module.
It has been at least since 2.6 introduced math.isnan(), and ISTR there was a proposal to add a separate module to deal with NaN and Inf properly in a pure-python library module. -- --Guido van Rossum (python.org/~guido)

On Wed, Mar 24, 2010 at 6:26 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Mark, I wonder if you could describe an algorithm off the top of your head that relies on NaN == NaN being false.
No, I certainly couldn't! And I often wonder if the original IEEE 754 committee, given 20/20 foresight, would have made the same decisions regarding comparisons of nans. It's certainly not one of my favourite features of IEEE 754. (Though sqrt(-0.) -> -0. ranks lower for me. Grr.) A bogus application that I've often seen mentioned is that it allows checking whether a float 'x' is a nan by doing `x == x'; but the proper way to do this is to have an 'isnan' function or method, so this isn't particularly convincing. Slightly more convincing is history: this is the way that nan comparisons behave in other languages (Fortran, C) used for numerics. If Python were to do something different then a naively translated algorithm from another language would fail. It's the behaviour that numerically-aware people expect, and I'd expect to get complaints from those people if it changed. Mark

On Wed, Mar 24, 2010 at 2:50 PM, Mark Dickinson <dickinsm@gmail.com> wrote: ..
If Python were to do something different then a naively translated algorithm from another language would fail. It's the behaviour that numerically-aware people expect, and I'd expect to get complaints from those people if it changed.
Numerically-aware people are likely to be aware of the differences in languages that they use. I think in this day and age you are more likely to hear from confused Java programmers than from Fortran or even C folks.

On Wed, Mar 24, 2010 at 2:50 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
On Wed, Mar 24, 2010 at 6:26 PM, Alexander Belopolsky <alexander.belopolsky@gmail.com> wrote:
Mark, I wonder if you could describe an algorithm off the top of your head that relies on NaN == NaN being false.
No, I certainly couldn't! And I often wonder if the original IEEE 754 committee, given 20/20 foresight, would have made the same decisions regarding comparisons of nans. It's certainly not one of my favourite features of IEEE 754.
I tried to google the rationale for the IEEE 754 decision, but came up with nothing. Here are a few representative results: """ So without fear let me not stop at the arguments that “the committee must have voted on this point and they obviously knew what they were doing” and “it is the standard and implemented on zillions of machines, you cannot change it now”. """ - "Reflexivity, and other pillars of civilization" by Bertrand Meyer http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civiliz... """ I suppose this simplifies numerical computations in some way, but I couldn't find an explicitly stated reason, not even in the Lecture Notes on the Status of IEEE 754 by Kahan which discusses other design decisions in detail. """ - "What is the rationale for all comparisons returning false for IEEE754 NaN values?" http://stackoverflow.com/questions/1565164/what-is-the-rationale-for-all-com...

On Mar 24, 2010, at 12:51 PM, Alexander Belopolsky wrote:
- "Reflexivity, and other pillars of civilization" by Bertrand Meyer http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civiliz...
Excellent link. Thanks for the research. Raymond

Thanks. Same link reported concurrently by Mark in the "Why is nan != nan?" thread. On Wed, Mar 24, 2010 at 4:26 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Mar 24, 2010, at 12:51 PM, Alexander Belopolsky wrote:
- "Reflexivity, and other pillars of civilization" by Bertrand Meyer http://bertrandmeyer.com/2010/02/06/reflexivity-and-other-pillars-of-civiliz...
Excellent link. Thanks for the research.
Raymond

On Thu, 25 Mar 2010 05:26:12 am Alexander Belopolsky wrote:
Mark, I wonder if you could describe an algorithm off the top of your head that relies on NaN == NaN being false.
I don't know whether "relies on" is appropriate, but consider: def myfunc(x, y): if x == y: return 1.0 else: return something_complicated**(x-y) Optimising floating point code is fraught with dangers (the above fails for x=y=INF as well as NAN) but anything that make Not A Numbers pretend to be numbers is a bad thing. I'd like to turn the question around ... what algorithms are there that rely on NaN == NaN being True? -- Steven D'Aprano

Steven D'Aprano wrote:
I'd like to turn the question around ... what algorithms are there that rely on NaN == NaN being True?
Absolutely anything that expects "x is y" to imply that "x == y". The builtin containers enforce this by checking identity before they check equality, but there are plenty of algorithms that won't. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Thu, 25 Mar 2010 10:25:35 pm Nick Coghlan wrote:
Steven D'Aprano wrote:
I'd like to turn the question around ... what algorithms are there that rely on NaN == NaN being True?
Absolutely anything that expects "x is y" to imply that "x == y". The builtin containers enforce this by checking identity before they check equality, but there are plenty of algorithms that won't.
Fair point, but I was actually thinking about mathematical algorithms. Builtin containers may also fail with any object that violates the expectation that identity implies equality, e.g.: class AlwaysDifferent: def __eq__(self, other): return False def __ne__(self, other): return True I don't see this as a problem -- if you choose to use such objects, you're responsible for understanding what is going on. If you choose to use floats, then you need to understand that NANs are weird. Personally, I'm less concerned about sets of floats ending up with strange combinations of NANs than I am about the possibility of disastrous maths errors caused by allowing NANs to test as equal. Here's a simplistic example: def func(a, b): if a == b: return 1.0 return math.sin(a*b)**(a-b) (I say "simplistic" because it currently fails for a=b=INF.) Currently this function will do the right thing for a, b both NANs:
func(float('nan'), float('nan')) nan
but making NANs test as equal will cause it to give the wrong answer. I fear that kind of error far more than the current funny behaviour of builtin containers with NANs. -- Steven D'Aprano

Steven D'Aprano <steve <at> pearwood.info> writes:
Personally, I'm less concerned about sets of floats ending up with strange combinations of NANs than I am about the possibility of disastrous maths errors caused by allowing NANs to test as equal. Here's a simplistic example:
You just said "if you choose to use floats, then you need to understand that NANs are weird". I wonder why this saying shouldn't apply to your "simplistic example" of NAN usage. (is your example even from real life?)

On Thu, Mar 25, 2010 at 04:18, Steven D'Aprano <steve@pearwood.info> wrote:
def myfunc(x, y): if x == y: return 1.0 else: return something_complicated**(x-y)
Optimising floating point code is fraught with dangers (the above fails for x=y=INF as well as NAN) but anything that make Not A Numbers pretend to be numbers is a bad thing.
What about this: def myfunc(x): if x >= THRESHOLD: return 1.0 else: return something_complicated(x) If one behaves right it's more likely a fluke, not a designed in feature. It's certainly not obvious without covering every comparison with comments. Maybe that's the solution. Signal by default on comparison, but add a collection of naneq/naneg/etc functions (math module, methods, whatever) that use a particular quiet mapping, making the whole thing explicit? -- Adam Olsen, aka Rhamphoryncus

Steven D'Aprano wrote:
I'd like to turn the question around ... what algorithms are there that rely on NaN == NaN being True?
That seems to be a straw question, since AFAIK nobody has suggested that there are any such algorithms. On the other hand, it has been claimed that some algorithms exist that benefit from Nan == NaN being false, so it's fair to ask for examples to back that up. -- Greg

On Thu, 25 Mar 2010 03:22:29 am Mark Dickinson wrote:
The obvious way to do this nan interning for floats would be to put the interning code into PyFloat_FromDouble. I'm not sure whether this would be worth the cost in terms of added code (and possibly reduced performance, since the nan check would be done every time a float was returned), but I'd be willing to review a patch.
I hope that it's obvious from my previous post that I do NOT want such interning done, but since I put the idea in people's heads, I wish to reiterate that I'm against the idea: -1 on interning NaNs. For the rare application where it might be useful, it is easy to do in the application code. -- Steven D'Aprano

Steven D'Aprano wrote:
On Thu, 25 Mar 2010 03:22:29 am Mark Dickinson wrote:
The obvious way to do this nan interning for floats would be to put the interning code into PyFloat_FromDouble. I'm not sure whether this would be worth the cost in terms of added code (and possibly reduced performance, since the nan check would be done every time a float was returned), but I'd be willing to review a patch.
I hope that it's obvious from my previous post that I do NOT want such interning done, but since I put the idea in people's heads, I wish to reiterate that I'm against the idea: -1 on interning NaNs. For the rare application where it might be useful, it is easy to do in the application code.
Yep, and I'll freely admit I didn't know about the potential additional state on NaN values, or I wouldn't have suggested interning automatically. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Mark Dickinson writes:
On Wed, Mar 24, 2010 at 5:36 AM, Stephen J. Turnbull <stephen@xemacs.org> wrote:
Steven D'Aprano writes:
> I suspect that's a feature, not a bug.
Right: distinct nans (i.e., those with different id()) are treated as distinct set elements or dict keys.
I don't see how it can be so. Aren't all of those entries garbage? To compute a histogram of results for computations on a series of cases would you not have to test each result for NaN-hood, then hash on a proxy such as the string "Nan"?
So what alternative behaviour would you suggest, and how would you implement it?
I don't have an alternative behavior to suggest. I'm not suggesting that it's a bug, I'm suggesting that it's a wart: useless, ugly, and in some presumably rare/buggy cases, it could lead to nasty behavior. The example I have in mind is computing a histogram of function values for a very large sample of inputs. (This is a pathological example, of course: things where NaNs are representable generally won't be used directly as keys in a dictionary used to represent a histogram. Rather, they would be mapped to a representative value as the key.) If there are a lot of NaN's, the dictionary could get unexpectedly large. That's not Python's fault, of course:
Meanwhile IEEE 754 requires that nans compare unequal to themselves, breaking reflexivity. So there have to be some compromises somewhere.
Indeed. IEEE 754 compatibility *is* a feature.
One alternative would be to prohibit putting nans into sets and dicts by making them unhashable; I'm not sure what that would gain, though.
I would find that more intuitive. While NaNs aren't mutable, they're similar to mutable values in that their value is not deterministic in a certain sense. OTOH, since the only example I can think of where I would personally want to check whether a NaN is in a container is pathological, my intuition is hardly reliable.

On Mar 23, 2010, at 5:09 AM, Stefan Krah wrote:
I like the simplicity of having a single signal (e.g. CoercionError), but a strictness context flag could offer greater control for people who only want pure decimal/integer operations.
For example:
strictness 0: completely promiscuous behaviour
strictness 1: current py3k behaviour
strictness 2: current py3k behaviour + pure equality comparisons
strictness 3: current py3k behaviour + pure equality comparisons + disallow NaN equality comparisons [1]
The decimal module is already drowning in complexity, so it would be best to keep it simple: one boolean flag that if set would warn about any implicit decimal/float interaction. Raymond

Raymond Hettinger wrote:
The decimal module is already drowning in complexity, so it would be best to keep it simple: one boolean flag that if set would warn about any implicit decimal/float interaction.
Agreed - those that want exceptions instead can use the usual warnings module mechanisms to trigger them. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Nick Coghlan <ncoghlan@gmail.com> wrote:
Raymond Hettinger wrote:
The decimal module is already drowning in complexity, so it would be best to keep it simple: one boolean flag that if set would warn about any implicit decimal/float interaction.
Agreed - those that want exceptions instead can use the usual warnings module mechanisms to trigger them.
I'm not sure about the warnings module. If lower complexity is a goal, I would prefer Facundo's original proposal of just adding a single new signal. Users who just want to know if a NonIntegerConversion has occurred can check the flags, users who want an exception set the trap. With the warnings module, users have to know (and deal with) two exception handling/suppressing mechanisms. Stefan Krah

FWIW, my viewpoint on this is softening over time and I no longer feel a need to push for a new context flag. It is probably simplest for users if implicit coercions didn't come with control knobs. We already have Fraction+float-->float occurring without any exceptions or warnings, and nothing bad has happened as a result. Also, I'm reminded of Tim Peter's admonition to resist extending the decimal spec. I used to worry that any decimal/float interactions were most likely errors and shouldn't pass silently. Now, I've just stopped worrying and I feel better already ;-) Adding a FAQ entry is simpler than building-out Context object circuitry and documenting it in an understandable way. Raymond On Mar 24, 2010, at 12:36 PM, Stefan Krah wrote:
Nick Coghlan <ncoghlan@gmail.com> wrote:
Raymond Hettinger wrote:
The decimal module is already drowning in complexity, so it would be best to keep it simple: one boolean flag that if set would warn about any implicit decimal/float interaction.
Agreed - those that want exceptions instead can use the usual warnings module mechanisms to trigger them.
I'm not sure about the warnings module. If lower complexity is a goal, I would prefer Facundo's original proposal of just adding a single new signal. Users who just want to know if a NonIntegerConversion has occurred can check the flags, users who want an exception set the trap.
With the warnings module, users have to know (and deal with) two exception handling/suppressing mechanisms.
Stefan Krah

Slight change of topic. I've been implementing the extra comparisons required for the Decimal type and found an anomaly while testing. Currently in py3k, order comparisons (but not ==, !=) between a complex number and another complex, float or int raise TypeError:
z = complex(0, 0) z < int() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < int() z < float() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < float() z < complex() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < complex()
But Fraction is the odd man out: a comparison between a Fraction and a complex raises a TypeError for complex numbers with nonzero imaginary component, but returns a boolean value if the complex number has zero imaginary component:
z < Fraction() False complex(0, 1) < Fraction() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < Fraction()
I'm tempted to call this Fraction behaviour a bug, but maybe it arises from the numeric integration themes of PEP 3141. Any ideas? Mark

On Mar 24, 2010, at 2:09 PM, Mark Dickinson wrote:
Slight change of topic. I've been implementing the extra comparisons required for the Decimal type and found an anomaly while testing. Currently in py3k, order comparisons (but not ==, !=) between a complex number and another complex, float or int raise TypeError:
z = complex(0, 0) z < int() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < int() z < float() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < float() z < complex() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < complex()
But Fraction is the odd man out: a comparison between a Fraction and a complex raises a TypeError for complex numbers with nonzero imaginary component, but returns a boolean value if the complex number has zero imaginary component:
z < Fraction() False complex(0, 1) < Fraction() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < Fraction()
I'm tempted to call this Fraction behaviour a bug, but maybe it arises from the numeric integration themes of PEP 3141. Any ideas?
Conceptually, it's a bug. The numeric tower treats non-complex numbers as special cases of complex where the imaginary component is zero (that's why the non-complex types all support real/imag), and since complex numbers are not allowed to compare to themselves, they shouldn't compare to anything else either. To confirm, we should ask Jeffrey Y to opine. Raymond

On Wed, Mar 24, 2010 at 2:29 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Mar 24, 2010, at 2:09 PM, Mark Dickinson wrote:
Slight change of topic. I've been implementing the extra comparisons required for the Decimal type and found an anomaly while testing. Currently in py3k, order comparisons (but not ==, !=) between a complex number and another complex, float or int raise TypeError:
z = complex(0, 0) z < int() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < int() z < float() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < float() z < complex() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < complex()
But Fraction is the odd man out: a comparison between a Fraction and a complex raises a TypeError for complex numbers with nonzero imaginary component, but returns a boolean value if the complex number has zero imaginary component:
z < Fraction() False complex(0, 1) < Fraction() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < Fraction()
I'm tempted to call this Fraction behaviour a bug, but maybe it arises from the numeric integration themes of PEP 3141. Any ideas?
Conceptually, it's a bug. The numeric tower treats non-complex numbers as special cases of complex where the imaginary component is zero (that's why the non-complex types all support real/imag), and since complex numbers are not allowed to compare to themselves, they shouldn't compare to anything else either.
That's how I read the PEP too. PEP 3141 doesn't define any ordering operations on Complex, they only show up on Real.
To confirm, we should ask Jeffrey Y to opine.
CC'ed him. After all looks like it was he who added it to Fraction. :-) -- --Guido van Rossum (python.org/~guido)

Raymond Hettinger wrote:
Conceptually, it's a bug. The numeric tower treats non-complex numbers as special cases of complex where the imaginary component is zero (that's why the non-complex types all support real/imag), and since complex numbers are not allowed to compare to themselves, they shouldn't compare to anything else either.
There's a contradiction in there somewhere. If you believe that a non-complex is equivalent to a complex with zero imaginary part, then you *should* be able to compare two complexes provided that their imaginary parts are both zero. (I don't think that should be the case, BTW -- complex numbers live on a two-dimensional plane, and from a geometrical point of view there's no reason to single out the x-axis and give it special treatment.) -- Greg

On Wed, Mar 24, 2010 at 2:09 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
Slight change of topic. I've been implementing the extra comparisons required for the Decimal type and found an anomaly while testing. Currently in py3k, order comparisons (but not ==, !=) between a complex number and another complex, float or int raise TypeError:
z = complex(0, 0) z < int() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < int() z < float() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < float() z < complex() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < complex()
But Fraction is the odd man out: a comparison between a Fraction and a complex raises a TypeError for complex numbers with nonzero imaginary component, but returns a boolean value if the complex number has zero imaginary component:
z < Fraction() False complex(0, 1) < Fraction() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < Fraction()
I'm tempted to call this Fraction behaviour a bug, but maybe it arises from the numeric integration themes of PEP 3141. Any ideas?
I'd call it a bug.

On Thu, Mar 25, 2010 at 1:15 AM, Jeffrey Yasskin <jyasskin@gmail.com> wrote:
On Wed, Mar 24, 2010 at 2:09 PM, Mark Dickinson <dickinsm@gmail.com> wrote:
Slight change of topic. I've been implementing the extra comparisons required for the Decimal type and found an anomaly while testing. Currently in py3k, order comparisons (but not ==, !=) between a complex number and another complex, float or int raise TypeError:
z = complex(0, 0) z < int() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < int() z < float() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < float() z < complex() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < complex()
But Fraction is the odd man out: a comparison between a Fraction and a complex raises a TypeError for complex numbers with nonzero imaginary component, but returns a boolean value if the complex number has zero imaginary component:
z < Fraction() False complex(0, 1) < Fraction() Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unorderable types: complex() < Fraction()
I'm tempted to call this Fraction behaviour a bug, but maybe it arises from the numeric integration themes of PEP 3141. Any ideas?
I'd call it a bug.
Thanks, Jeffrey (and everyone else who answered). Fixed in r79456 (py3k) and r79455 (trunk). Mark

W00t! On Wed, Mar 24, 2010 at 1:56 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
FWIW, my viewpoint on this is softening over time and I no longer feel a need to push for a new context flag.
It is probably simplest for users if implicit coercions didn't come with control knobs. We already have Fraction+float-->float occurring without any exceptions or warnings, and nothing bad has happened as a result.
Also, I'm reminded of Tim Peter's admonition to resist extending the decimal spec.
I used to worry that any decimal/float interactions were most likely errors and shouldn't pass silently. Now, I've just stopped worrying and I feel better already ;-) Adding a FAQ entry is simpler than building-out Context object circuitry and documenting it in an understandable way.
Raymond
On Mar 24, 2010, at 12:36 PM, Stefan Krah wrote:
Nick Coghlan <ncoghlan@gmail.com> wrote:
Raymond Hettinger wrote:
The decimal module is already drowning in complexity, so it would be best to keep it simple: one boolean flag that if set would warn about any implicit decimal/float interaction.
Agreed - those that want exceptions instead can use the usual warnings module mechanisms to trigger them.
I'm not sure about the warnings module. If lower complexity is a goal, I would prefer Facundo's original proposal of just adding a single new signal. Users who just want to know if a NonIntegerConversion has occurred can check the flags, users who want an exception set the trap.
With the warnings module, users have to know (and deal with) two exception handling/suppressing mechanisms.
Stefan Krah
_______________________________________________ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/guido%40python.org
-- --Guido van Rossum (python.org/~guido)

On Wed, Mar 24, 2010 at 8:56 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
FWIW, my viewpoint on this is softening over time and I no longer feel a need to push for a new context flag.
It is probably simplest for users if implicit coercions didn't come with control knobs. We already have Fraction+float-->float occurring without any exceptions or warnings, and nothing bad has happened as a result.
I agree with this; I'd be happy to avoid the control knobs. Mark

On 3/24/2010 1:56 PM, Raymond Hettinger wrote:
FWIW, my viewpoint on this is softening over time and I no longer feel a need to push for a new context flag.
To make Decimal useful for people that want to control its numerical quality, there must be a way to exclude accidental operations, and preferably an option for producing exceptions. Otherwise the status quo in 3.x is preferable to adding it to the numeric tree. I agree that not worrying can make you feel better, though. :)
IIRC, the original purpose of a NaN was to serve as a placeholder value in a long series of floating point ops so that the programmer would not have to introduce edge case tests at each stage of a calculation. Yet, I look at the code for the decimal module and the C code for the math module and see that the opposite result occurred, the code is littered with is_special(x) tests and handlers.
You are looking at the wrong code. The point is that the code _using_ the math module and decimal module doesn't have to be littered with edge case tests; not that the implementation, which the good folks at IEEE figured would be in custom silicon that could do the edge case checking in parallel with other operations and with no net slowdown, would not have to be littered with edge cases. Glenn

On 3/19/2010 2:50 PM, Guido van Rossum wrote:
I'd like to reboot this thread.
I'll go along with that idea!
I've been spinning this topic in my head for most of the morning, and I think we should seriously reconsider allowing mixed arithmetic involving Decimal, not just mixed comparisons. [Quick summary: embed Decimal in the numeric tower but add a context flag to disallow implicit mixing of float and Decimal.]
As long as there is a way to avoid implicit mixing of float and Decimal, or to easily detect (preferably with an exception) the implicit mixing, then I think that solves the concerns of people trying to write numerically correct code using Decimal. And if Mark (or someone) can solve the hashing anomoly problem without huge expense, then it could be a winner.
I tried to find the argumentation against it in PEP 327
Should Aahz be consulted, as some of the objections in PEP 327 are attributed to him, but he is pretty scarce around here these days?
Also, this would be a possible road towards eventually supporting a language extension where floating point literals produce Decimal values instead of binary floats. (A possible syntax could be "from __options__ import decimal_float", which would work similar to "from __future__ import ..." except it's a permanent part of the language rather than a forward compatibility feature.)
Nice touch here... rather than being forced to quote Decimal values as strings and convert from string, or use a tuple to represent the parts, both of which are warts. Not sure what context would be used, though. Glenn

On Sat, 20 Mar 2010 08:50:04 am Guido van Rossum wrote:
I'd like to reboot this thread. I've been spinning this topic in my head for most of the morning, and I think we should seriously reconsider allowing mixed arithmetic involving Decimal, not just mixed comparisons. [Quick summary: embed Decimal in the numeric tower but add a context flag to disallow implicit mixing of float and Decimal.]
The last few days, I've argued against changing the prohibition against mixed arithmetic operations. But you've inspired me to go take a look at a module I've been neglecting, fractions, and I've learned that the Fraction type already fully supports arithmetic and comparisons with floats and ints. I'm extremely impressed -- I had no idea the numeric tower in 2.6 was this advanced. (I still do most of my work with 2.5.) Decimal appears to be the odd one:
f = Fraction(0) d = Decimal(0) 0 == 0.0 == 0j == f True 0 == 0.0 == 0j == f == d False
Not just odd in the sense of "different", but also odd in the sense of "weird":
d == 0 == 0.0 == 0j == f True
[...]
I'd like to look at the issue by comparing the benefits and drawbacks of properly embedding Decimal into the numeric tower. As advantages, I see consistent behavior in situations like the above and more intuitive behavior for beginners. Also, this would be a possible road towards eventually supporting a language extension where floating point literals produce Decimal values instead of binary floats. (A possible syntax could be "from __options__ import decimal_float", which would work similar to "from __future__ import ..." except it's a permanent part of the language rather than a forward compatibility feature.)
That's far more ambitious than I was willing to even imagine, but now that you've suggested it, I like it. -- Steven D'Aprano

Guido van Rossum wrote:
I think we should seriously reconsider allowing mixed arithmetic involving Decimal, not just mixed comparisons.
I'm glad I'm not the only one that started wondering that. I wasn't quite game enough to actually suggest it though :)
There is one choice which I'm not sure about. Should a mixed float/Decimal operation return a float or a Decimal? I note that Fraction (which *is* properly embedded in the numeric tower) supports this and returns a float result in this case. While I earlier proposed to return the most "complicated" type of the two, i.e. Decimal, I now think it may also make sense to return a float, being the most "fuzzy" type in the numeric tower. This would also make checking for accidental floats easier, since floats now propagate throughout the computation (like NaN) and a simple assertion that the result is a Decimal instance suffices to check that no floats were implicitly mixed into the computation.
To blend a couple of different ideas from the thread reboot together: I suggest a 'linearised' numeric tower that looks like: int -> Decimal -> Fraction -> float -> complex As Adam stated, this is a pragmatic tower stating which implicit coercions are defined by the code, not a formal mathematical relationship. Note that this would involve adding mixed Fraction/Decimal arithmetic as well as Decimal/float arithmetic. I placed Decimal to the left of Fraction to keep Decimal's dependencies clear and because Decimal -> Fraction conversions appear straightforward (using power of 10 denominators) without introducing the precision issues that would arise in Decimal -> Fraction conversions. I also like the idea of adding the decimal context signal that Facundo suggests (e.g. under the name ImplicitCoercionToBinaryFloat). So "quick and dirty don't need a perfect answer" operations end up with ordinary binary floats, while more precise code can enable the signal trap to ensure the calculation fails noisily if binary floats are introduced. Allowing implicit operations would also finally allow Decimal to be registered with numbers.Real. Whatever we decide to do will need to be codified in a PEP though. "Cleaning Up Python's Numeric Tower" or something along those lines.
The implementation of __hash__ will be complicated, and it may make sense to tweak the hash function of float, Fraction and Decimal to make it easier to ensure that for values that can be represented in either type the hash matches the equality. But this sounds a worthwhile price to pay for proper embedding in the numeric tower.
And Mark appears to already have a good answer to that problem. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Mar 20, 2010, at 6:54 PM, Nick Coghlan wrote:
I suggest a 'linearised' numeric tower that looks like:
int -> Decimal -> Fraction -> float -> complex
Is that a typo? Shouldn't Decimal and float go between Fraction and complex? The abstract numeric tower is: Number Complex Real Rational Integral where both Decimal and float have operations associated with reals. Raymond

Raymond Hettinger wrote:
On Mar 20, 2010, at 6:54 PM, Nick Coghlan wrote:
I suggest a 'linearised' numeric tower that looks like:
int -> Decimal -> Fraction -> float -> complex
Is that a typo? Shouldn't Decimal and float go between Fraction and complex?
The abstract numeric tower is:
Number Complex Real Rational Integral
where both Decimal and float have operations associated with reals.
I don't actually mind either way - the pragmatic tower is about coding convenience rather than numeric purity (and mixing Fractions and Decimals in the same algorithm is somewhat nonsensical - they're designed for two completely different problem domains). Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

On Sun, Mar 21, 2010 at 00:58, Nick Coghlan <ncoghlan@gmail.com> wrote:
I don't actually mind either way - the pragmatic tower is about coding convenience rather than numeric purity (and mixing Fractions and Decimals in the same algorithm is somewhat nonsensical - they're designed for two completely different problem domains).
I think the rule I've been going on is that ideal types (int, Fraction) are on one end and pragmatic types (float, complex) are on the other. Since Decimal can be used exactly it clearly bridges both groups. *However*, there's other possible types out there, and would they fit into my system? I've just taken a look at sympy and although it's clearly an ideal type, it also allows mixing with float and complex, both producing sympy types. That puts it clearly past float and complex in the tower. I have no ideal where Decimal should go. -- Adam Olsen, aka Rhamphoryncus

On Sat, Mar 20, 2010 at 6:54 PM, Nick Coghlan <ncoghlan@gmail.com> wrote:
Guido van Rossum wrote:
I think we should seriously reconsider allowing mixed arithmetic involving Decimal, not just mixed comparisons.
I'm glad I'm not the only one that started wondering that. I wasn't quite game enough to actually suggest it though :)
There is one choice which I'm not sure about. Should a mixed float/Decimal operation return a float or a Decimal? I note that Fraction (which *is* properly embedded in the numeric tower) supports this and returns a float result in this case. While I earlier proposed to return the most "complicated" type of the two, i.e. Decimal, I now think it may also make sense to return a float, being the most "fuzzy" type in the numeric tower. This would also make checking for accidental floats easier, since floats now propagate throughout the computation (like NaN) and a simple assertion that the result is a Decimal instance suffices to check that no floats were implicitly mixed into the computation.
To blend a couple of different ideas from the thread reboot together:
I suggest a 'linearised' numeric tower that looks like:
int -> Decimal -> Fraction -> float -> complex
As Adam stated, this is a pragmatic tower stating which implicit coercions are defined by the code, not a formal mathematical relationship.
Note that this would involve adding mixed Fraction/Decimal arithmetic as well as Decimal/float arithmetic.
Yes, that was my intention too.
I placed Decimal to the left of Fraction to keep Decimal's dependencies clear and because Decimal -> Fraction conversions appear straightforward (using power of 10 denominators) without introducing the precision issues that would arise in Decimal -> Fraction conversions.
This is just wrong. Decimal is much more like float than like int or Fraction, due to its rounding behavior. (You cannot produce exact results for 1/3 using either Decimal or float.) Clearly the difficult decision is whether Decimal is between Fraction and float, or between float and complex (I prefer not to say "to the left/right of" since PEP 3141 orders the types in the opposite way as people have been doing here). Both are floating point types that sometimes produce rounded results (e.g. 1/3); the difference is that they use different bases and that Decimal has configurable precision when rounding. I would make float the last station before complex, whereas Mark would prefer Decimal to go there. I defer to Mark, who has thought a lot more about Decimal than I have.
I also like the idea of adding the decimal context signal that Facundo suggests (e.g. under the name ImplicitCoercionToBinaryFloat).
Yes, this is what I was trying to say (but didn't find the right words for).
So "quick and dirty don't need a perfect answer" operations end up with ordinary binary floats, while more precise code can enable the signal trap to ensure the calculation fails noisily if binary floats are introduced.
But does this break a tie for the relative ordering of Decimal and float in the tower?
Allowing implicit operations would also finally allow Decimal to be registered with numbers.Real.
Right, that's one aspect of what I meant by "embedded in the numeric tower".
Whatever we decide to do will need to be codified in a PEP though. "Cleaning Up Python's Numeric Tower" or something along those lines.
The cleanup is really just specific to Decimal -- int, Fraction and float are already properly embedded in the tower (PEP 3141 doesn't advertise Fraction enough, since it predates it). That we may have to change the __hash__ implementation for the other types is merely a compromise towards efficiency.
The implementation of __hash__ will be complicated, and it may make sense to tweak the hash function of float, Fraction and Decimal to make it easier to ensure that for values that can be represented in either type the hash matches the equality. But this sounds a worthwhile price to pay for proper embedding in the numeric tower.
And Mark appears to already have a good answer to that problem.
Which I still have to review. (Mark, if you're there, could you make a brief post here on the mathematical definition of the new hash you're proposing, and why it is both efficient to compute and good (enough) as a hash function?) -- --Guido van Rossum (python.org/~guido)

On Sun, Mar 21, 2010 at 9:53 AM, Guido van Rossum <guido@python.org> wrote:
Which I still have to review. (Mark, if you're there, could you make a brief post here on the mathematical definition of the new hash you're proposing, and why it is both efficient to compute and good (enough) as a hash function?)
Never mind, Mark. I found your explanation here: http://codereview.appspot.com/660042/diff/19001/11009?column_width=80 -- --Guido van Rossum (python.org/~guido)

On Mar 21, 2010, at 10:02 AM, Guido van Rossum wrote:
On Sun, Mar 21, 2010 at 9:53 AM, Guido van Rossum <guido@python.org> wrote:
Which I still have to review. (Mark, if you're there, could you make a brief post here on the mathematical definition of the new hash you're proposing, and why it is both efficient to compute and good (enough) as a hash function?)
Never mind, Mark. I found your explanation here: http://codereview.appspot.com/660042/diff/19001/11009?column_width=80
I'm often dazzled by Mark's brilliance, but this is an especially nice bit of reasoning. Raymond

Note that this would involve adding mixed Fraction/Decimal arithmetic as well as Decimal/float arithmetic.
Yes, that was my intention too.
+1
I placed Decimal to the left of Fraction to keep Decimal's dependencies clear and because Decimal -> Fraction conversions appear straightforward (using power of 10 denominators) without introducing the precision issues that would arise in Decimal -> Fraction conversions.
This is just wrong. Decimal is much more like float than like int or Fraction, due to its rounding behavior. (You cannot produce exact results for 1/3 using either Decimal or float.)
Right. We should be guided by: fractions are a superset of decimals which are a superset of binary floats. And by: binary floats and decimal floats both implement all of the operations for the Real abstract base class.
Yes, this is what I was trying to say (but didn't find the right words for).
So "quick and dirty don't need a perfect answer" operations end up with ordinary binary floats, while more precise code can enable the signal trap to ensure the calculation fails noisily if binary floats are introduced.
But does this break a tie for the relative ordering of Decimal and float in the tower?
It seems to me that Decimals and floats should be considered at the same level (i.e. both implement Real). Mixed Decimal and float should coerce to Decimal because it can be done losslessly. There is no need to embed a notion of "imperfect answer". Numbers themselves are exact and many mixed operations can be exact is the coercions go the right way. Some folks who have had bad experiences with representation error (i.e. 1.1 cannot be exactly represented as a binary float) or with round-off error (i.e. 1.0 / 7.0 must be rounded) tend to think of both binary or decimal floats as necessarily inexact. But that is not the case, exact accounting work is perfectly feasable with decimals. Remember, the notion of inexactness is a taint, not an intrinsic property of a type. Even the Scheme numeric tower recognizes this. LIkewise, the decimal specification also spells-out this notion as basic to its design.
Whatever we decide to do will need to be codified in a PEP though. "Cleaning Up Python's Numeric Tower" or something along those lines.
The cleanup is really just specific to Decimal -- int, Fraction and float are already properly embedded in the tower (PEP 3141 doesn't advertise Fraction enough, since it predates it). That we may have to change the __hash__ implementation for the other types is merely a compromise towards efficiency.
I believe that no "clean-up" is necessary. Decimal already implements the Real ABC. All that is necessary is the common __hash__ algorithm and removing the restriction between decimal/float interaction so that any two instances of Real can interoperate with one another. Raymond

On Sun, 21 Mar 2010 11:25:34 -0700, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
It seems to me that Decimals and floats should be considered at the same level (i.e. both implement Real).
Mixed Decimal and float should coerce to Decimal because it can be done losslessly.
There is no need to embed a notion of "imperfect answer". Numbers themselves are exact and many mixed operations can be exact is the coercions go the right way.
I think the concern here is rather about operations such as: 1.1 + Decimal('1.1') The calculation may produce an "exact" result, but it won't be the exact result expected, because the conversion from string (at the program text file level) to float was lossy. Thus the desire for some mechanism to know that floats and decimals have been mixed anywhere in the calculations that led up to whatever result number you are looking at. And to have doing so trigger an exception if requested by the programmer. -- R. David Murray www.bitdance.com

On Mar 21, 2010, at 11:50 AM, R. David Murray wrote:
On Sun, 21 Mar 2010 11:25:34 -0700, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
It seems to me that Decimals and floats should be considered at the same level (i.e. both implement Real).
Mixed Decimal and float should coerce to Decimal because it can be done losslessly.
There is no need to embed a notion of "imperfect answer". Numbers themselves are exact and many mixed operations can be exact is the coercions go the right way.
I think the concern here is rather about operations such as:
1.1 + Decimal('1.1')
The calculation may produce an "exact" result, but it won't be the exact result expected, because the conversion from string (at the program text file level) to float was lossy. Thus the desire for some mechanism to know that floats and decimals have been mixed anywhere in the calculations that led up to whatever result number you are looking at. And to have doing so trigger an exception if requested by the programmer.
That makes sense. That's why Guido proposed a context flag in decimal to issue a warning for implicit mixing of decimals and floats. What I was talking about was a different issue. The question of where to stack decimals in the hierarchy was erroneously being steered by the concept that both decimal and binary floats are intrinsically inexact. But that would be incorrect, inexactness is a taint, the numbers themselves are always exact. I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning. The default should be to issue the warning (because unless you know what you're doing, it is most likely an error). Raymond

On Mon, 22 Mar 2010 06:31:57 am Raymond Hettinger wrote:
I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning. The default should be to issue the warning (because unless you know what you're doing, it is most likely an error).
When you say "warning", do you mean warning.warn(), or an exception? I'd like to put in a vote for allowing naive users with low requirements for accuracy and precision to be able to type something like this in the interactive interpreter:
Decimal(1) + 1.0
and get two (in whatever type is decided on) without having to change the context or deal with an exception. Yes, this means that they may be surprised if they perform an operation which suffers from rounding errors, but that's no worse than what happens with floats. If naive users are going to use the interpreter as a calculator, they're going to start off using floats and ints simply because they require less typing. My idea is to allow a gentle learning curve with Decimal (and Fraction) without scaring them off with exceptions or excessive warnings: a single warning per session would be okay, a warning after every operation would be excessive in my opinion, and exceptions by default would be right out. -- Steven D'Aprano

On Mar 21, 2010, at 3:59 PM, Steven D'Aprano wrote:
On Mon, 22 Mar 2010 06:31:57 am Raymond Hettinger wrote:
I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning. The default should be to issue the warning (because unless you know what you're doing, it is most likely an error).
When you say "warning", do you mean warning.warn(), or an exception?
I'm not sure I understand your question. I did mean warnings.warn(). But that does raise a catchable exception or it can be suppressed through the warnings module. It should probably be set to warn no more than once. Raymond

On Sun, Mar 21, 2010 at 4:16 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Mar 21, 2010, at 3:59 PM, Steven D'Aprano wrote:
On Mon, 22 Mar 2010 06:31:57 am Raymond Hettinger wrote:
I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning. The default should be to issue the warning (because unless you know what you're doing, it is most likely an error).
When you say "warning", do you mean warning.warn(), or an exception?
I'm not sure I understand your question. I did mean warnings.warn(). But that does raise a catchable exception or it can be suppressed through the warnings module. It should probably be set to warn no more than once.
I would hope it could use whatever mechanism is already used for other conditions in the decimal module such as Underflow, Inexact, Rounded etc. But I have to admit I don't know exactly what those do. It appears they can either raise an exception or call a handle() method on the given exception. Are you thinking of putting the warn() call inside that handle() method? -- --Guido van Rossum (python.org/~guido)

On Mar 21, 2010, at 6:24 PM, Guido van Rossum wrote:
On Sun, Mar 21, 2010 at 4:16 PM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
On Mar 21, 2010, at 3:59 PM, Steven D'Aprano wrote:
On Mon, 22 Mar 2010 06:31:57 am Raymond Hettinger wrote:
I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning. The default should be to issue the warning (because unless you know what you're doing, it is most likely an error).
When you say "warning", do you mean warning.warn(), or an exception?
I'm not sure I understand your question. I did mean warnings.warn(). But that does raise a catchable exception or it can be suppressed through the warnings module. It should probably be set to warn no more than once.
I would hope it could use whatever mechanism is already used for other conditions in the decimal module such as Underflow, Inexact, Rounded etc. But I have to admit I don't know exactly what those do. It appears they can either raise an exception or call a handle() method on the given exception. Are you thinking of putting the warn() call inside that handle() method?
Yes. Raymond

On Sun, Mar 21, 2010 at 16:59, Steven D'Aprano <steve@pearwood.info> wrote:
If naive users are going to use the interpreter as a calculator, they're going to start off using floats and ints simply because they require less typing. My idea is to allow a gentle learning curve with Decimal (and Fraction) without scaring them off with exceptions or excessive warnings: a single warning per session would be okay, a warning after every operation would be excessive in my opinion, and exceptions by default would be right out.
That strikes me as a passive-aggressive way of saying we tolerate it for interactive use, but don't you dare mix them for real programs. A warning should be regarded as a bug in real programs — unless it's a transitional measure — so it might as well be an exception. Don't guess and all that. -- Adam Olsen, aka Rhamphoryncus

Raymond Hettinger wrote:
The question of where to stack decimals in the hierarchy was erroneously being steered by the concept that both decimal and binary floats are intrinsically inexact. But that would be incorrect, inexactness is a taint, the numbers themselves are always exact.
I don't think that's correct. "Numbers are always exact" is a simplification due to choosing not to attach an inexactness flag to every value. Without such a flag, we don't really know whether any given value is exact or not, we can only guess. The reason for regarding certain types as "implicitly inexact" is something like this: If you start with exact ints, and do only int operations with them, you must end up with exact ints. But the same is not true of float or Decimal: even if you start with exact values, you can end up with inexact ones.
I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning.
Personally I feel that far too much stuff concerning decimals is controlled by implicit context parameters. It gives me the uneasy feeling that I don't know what the heck any given decimal operation is going to do. It's probably justified in this case, though. -- Greg

Raymond Hettinger wrote:
The question of where to stack decimals in the hierarchy was erroneously being steered by the concept that both decimal and binary floats are intrinsically inexact. But that would be incorrect, inexactness is a taint, the numbers themselves are always exact.
I don't think that's correct. "Numbers are always exact" is a simplification due to choosing not to attach an inexactness flag to every value. Without such a flag, we don't really know whether any given value is exact or not, we can only guess. The reason for regarding certain types as "implicitly inexact" is something like this: If you start with exact ints, and do only int operations with them, you must end up with exact ints. But the same is not true of float or Decimal: even if you start with exact values, you can end up with inexact ones.
I really like Guido's idea of a context flag to control whether mixing of decimal and binary floats will issue a warning.
Personally I feel that far too much stuff concerning decimals is controlled by implicit context parameters. It gives me the uneasy feeling that I don't know what the heck any given decimal operation is going to do. It's probably justified in this case, though. -- Greg

On Sun, Mar 21, 2010 at 11:25 AM, Raymond Hettinger <raymond.hettinger@gmail.com> wrote:
Right. We should be guided by: fractions are a superset of decimals which are a superset of binary floats.
But mixed Fraction-float operations return floats, not Fractions.
And by: binary floats and decimal floats both implement all of the operations for the Real abstract base class.
Sure, but that doesn't help us decide what mixed Decimal-float operations should return.
It seems to me that Decimals and floats should be considered at the same level (i.e. both implement Real).
Agreed, but doesn't help. (Except against the idea that Decimal goes on the "integer" side of Fraction, which is just wrong.)
Mixed Decimal and float should coerce to Decimal because it can be done losslessly.
But mixed Fraction-float returns float even though returning Fraction could be done losslessly. The real criterion should be what's more useful, not what can be done losslessly.
There is no need to embed a notion of "imperfect answer". Numbers themselves are exact and many mixed operations can be exact if the coercions go the right way.
Division cannot, in general (I consider floor division a bastard child of the integers). And for multiplication it seems that rounding at some point becomes necessary since the alternative would be to use infinite precision.
Some folks who have had bad experiences with representation error (i.e. 1.1 cannot be exactly represented as a binary float) or with round-off error (i.e. 1.0 / 7.0 must be rounded) tend to think of both binary or decimal floats as necessarily inexact. But that is not the case, exact accounting work is perfectly feasable with decimals. Remember, the notion of inexactness is a taint, not an intrinsic property of a type. Even the Scheme numeric tower recognizes this. LIkewise, the decimal specification also spells-out this notion as basic to its design.
I really don't think advertising Decimal as having exact operations is the right thing to do. Sure, it is the right data type for all accounting operations -- but that is a very specialized use case, where certain round-off errors are desirable (since nobody wants fractional pennies in their bill).
I believe that no "clean-up" is necessary. Decimal already implements the Real ABC. All that is necessary is the common __hash__ algorithm and removing the restriction between decimal/float interaction so that any two instances of Real can interoperate with one another.
Call it by any name you want. We're looking at revising the hash function and allowing mixed operations between Decimal and float, with signal that warns about these operations. I see two open issues: whether mixed Decimal-float operations should return Decimal and float, and whether the warning about such operations should be on or off by default. My gut tells me that the signal should be off by default. My gut doesn't say much about whether the result should be float or Decimal, but I want the reasoning to reach a decision to be sound. -- --Guido van Rossum (python.org/~guido)

On Mar 21, 2010, at 3:35 PM, Guido van Rossum wrote:
It seems to me that Decimals and floats should be considered at the same level (i.e. both implement Real).
Agreed, but doesn't help. (Except against the idea that Decimal goes on the "integer" side of Fraction, which is just wrong.)
Woohoo! Glad you agree. I was concerned that idea was gathering a following.
There is no need to embed a notion of "imperfect answer". Numbers themselves are exact and many mixed operations can be exact if the coercions go the right way.
Division cannot, in general (I consider floor division a bastard child of the integers). And for multiplication it seems that rounding at some point becomes necessary since the alternative would be to use infinite precision.
Perception here is probably dictated by the use case of the beholder :-) In the accounting world, such operations are very common (part of getting three apples for one dollar recorded as 0.33, 0.33, and 0.34 in the individual apple accounts). Cost allocation tools use floor division all the time. Amortization computations always have to take into account that real payments and interest charges do not have fractional pennies (or sometimes fractional dollars) and have to readjust the amortization accordingly (usually with small roundings to the interest charges and a small adjustment to the final payment).
Call it by any name you want. We're looking at revising the hash function and allowing mixed operations between Decimal and float, with signal that warns about these operations. I see two open issues: whether mixed Decimal-float operations should return Decimal and float, and whether the warning about such operations should be on or off by default. My gut tells me that the signal should be off by default. My gut doesn't say much about whether the result should be float or Decimal, but I want the reasoning to reach a decision to be sound.
My vote is for: decimal + float --> decimal and to emit a warning by default. Reasoning for coercion to decimal: 1) decimal + float can be done losslessly and meaningfully 2) decimal/float comparisons can give a useful and non-confusing result for: 1.1 == Decimal('1.1') This needs to return False since the two values: a) have different hash values (under either proposal) and b) are in-fact, not equal But, if the decimal were coerced to a binary float, the relation would return True: >>> float(decimal.Decimal('1.1')) == 1.1 True This is bad. Reasoning for emitting a warning by default: 1) Real actual use cases for mixed decimal / float operations are rare. 2) But accidental mixed decimal / float is an easy mistake to make. 3) Implicit coercion hides the error. 4) A warning flag gives you a chance to catch your error. 5) A warning is educational (it makes sure that you understand what your program is doing) 6). A warning is easily silenced either through a) the warnings module, b) setting a context flag in decimal, or c) by making the coercion explicit using Decimal.from_float(). Raymond

Raymond Hettinger wrote:
On Mar 21, 2010, at 3:35 PM, Guido van Rossum wrote:
It seems to me that Decimals and floats should be considered at the same level (i.e. both implement Real). Agreed, but doesn't help. (Except against the idea that Decimal goes on the "integer" side of Fraction, which is just wrong.)
Woohoo! Glad you agree. I was concerned that idea was gathering a following.
Heck no, it was just a random late night thought from me, and even I thought it was a somewhat dubious idea. I don't mind at all that it since has been knocked soundly (and deservedly) on the head :)
Reasoning for emitting a warning by default:
1) Real actual use cases for mixed decimal / float operations are rare. 2) But accidental mixed decimal / float is an easy mistake to make. 3) Implicit coercion hides the error. 4) A warning flag gives you a chance to catch your error. 5) A warning is educational (it makes sure that you understand what your program is doing) 6). A warning is easily silenced either through a) the warnings module, b) setting a context flag in decimal, or c) by making the coercion explicit using Decimal.from_float().
I'll add another one to that list: (7) For backwards compatible changes, it is easy to go from exception -> warning -> no warning (if we later decide to take that second step). Going from exception -> no warning -> warning (if we were to change our minds the other way) is a lot less user-friendly. (I was going to try to play devil's advocate and argue in favour of float results and/or no warning, but I got nuthin' - Raymond's points made too much sense to me). A warning is nice in that you can mix decimals and floats at the interpreter prompt with a single warning per session, but the warning can still act as a pointer into the weird and wonderful world of binary vs decimal floating point. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Raymond Hettinger wrote:
Remember, the notion of inexactness is a taint, not an intrinsic property of a type. Even the Scheme numeric tower recognizes this. LIkewise, the decimal specification also spells-out this notion as basic to its design.
I'm not sure it really does, otherwise every decimal value would have a flag indicating whether it was tainted with inexactness, and this flag would propagate through calculations. -- Greg

Greg Ewing wrote:
Raymond Hettinger wrote:
Remember, the notion of inexactness is a taint, not an intrinsic property of a type. Even the Scheme numeric tower recognizes this. LIkewise, the decimal specification also spells-out this notion as basic to its design.
I'm not sure it really does, otherwise every decimal value would have a flag indicating whether it was tainted with inexactness, and this flag would propagate through calculations.
http://docs.python.org/library/decimal.html#decimal.Inexact (Part of the thread context rather than the individual decimal values, but if you use it properly it tells you whenever an inexact operation has occurred in the current thread) Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia ---------------------------------------------------------------

Nick Coghlan wrote:
http://docs.python.org/library/decimal.html#decimal.Inexact
(Part of the thread context rather than the individual decimal values, but if you use it properly it tells you whenever an inexact operation has occurred in the current thread)
My problem was that the statement "All numbers are exact; inexactness is a taint" appears to be self-contradictory. Numbers *can* become tainted with inexactness whether you explicitly keep track of that or not. However, the excerpt by Cowlishaw posted earlier reveals where the confusion is coming from, I think. What Cowlishaw appears to mean by "numbers are exact" is that Decimals represent particular values, not *intervals*. This is not really the same thing as the notion of inexact numbers in the numeric tower. There, it means more like "this number may not quite represent the value the programmer had in mind". -- Greg
participants (15)
-
Adam Olsen
-
Alexander Belopolsky
-
Antoine Pitrou
-
Facundo Batista
-
Glenn Linderman
-
Greg Ewing
-
Guido van Rossum
-
Jeffrey Yasskin
-
Mark Dickinson
-
Nick Coghlan
-
R. David Murray
-
Raymond Hettinger
-
Stefan Krah
-
Stephen J. Turnbull
-
Steven D'Aprano