Re: [Python-ideas] [Python-Dev] The Case Against Floating Point ==
(with apologies for the random extra level of quoting in the below...)
On Thu, Mar 13, 2008 at 11:09 AM, Imri Goldberg <lorgandon@gmail.com> wrote:
As I said earlier, I'd like static checkers (like Python-Lint) to catch
this sort of cases, whatever the decision may be.
Hmm. Isn't that tricky? How does the static checker decide whether the objects being compared are floats? I guess one could be content with catching some cases where the operands to == are clearly floats... Wouldn't you have to have run-time warnings to be really sure of catching all the cases?
It's already too late for Python 3.0. Still, I believe it is worth discussing.
Sure. I didn't mean that to come out in quite the dismissive way it did :). Apologies. Maybe a PEP aimed at Python 4.0 is in order. If you're open to the idea of just having some way to enable warnings, it could be much sooner.
While checking against a==0.0 (and other similar conditions) before dividing will indeed protect from outright division by zero, it will enlarge any error you will have in the computation. I guess it would be better to do the same check for 'a is small' for appropriate values of 'small'.
Still, a check for 0.0 is good enough in some cases: if a is tiny, the large intermediate values may appear and then disappear happily before giving a sensible final result. These are usually the sort of cases where just having division by 0.0 return an infinity would have "just worked" too (making the whole "if" redundant), but that's not (currently!) an option in Python. It's a truism that floating-point equality tests should be avoided, but it's just not true that floating-point equality testing is *always* wrong, and I don't think that Python should make it so. Actually, one of the reasons I thought about this subject in the first
place, was dict lookup for floating point numbers. It seems to me that it's something you just shouldn't do.
So your proposal would presumably include making x in dict and x not in dict errors for any float x, regardless of the contents of the dictionary (or list, or set, or frozenset, or...) dict? What would you do about Decimals? A Decimal is just another floating point format (albeit base 10 instead of base 2); so presumably all these warnings/errors should apply equally to Decimal instances? If not, why not? I'm not trying to be negative here---as Aahz says, this is an interesting idea; I'm just trying to understand exactly how things might work. Mark
Imri, Aargh! Sorry about the multiple emails. The first one bounced because I wasn'tsubscribed to python-ideas, so I canceled it and sent it again, forgetting that you would still have got a copy of the first email. And now I'm sending you a third one, just to apologise for the second one (or was it the first.) Double apologies, and I'll try not to do it again. Mark
Mark Dickinson wrote:
(with apologies for the random extra level of quoting in the below...)
On Thu, Mar 13, 2008 at 11:09 AM, Imri Goldberg <lorgandon@gmail.com <mailto:lorgandon@gmail.com>> wrote:
As I said earlier, I'd like static checkers (like Python-Lint) to catch this sort of cases, whatever the decision may be.
Hmm. Isn't that tricky? How does the static checker decide whether the objects being compared are floats? I guess one could be content with catching some cases where the operands to == are clearly floats... Wouldn't you have to have run-time warnings to be really sure of catching all the cases?
Yes. Writing a static-checker for Python is tricky in any case. For the sake of this discussion, it might be useful to refer to some 'ideal' static checker. This will allow us to better define what is the desired behavior.
> It's already too late for Python 3.0. Still, I believe it is worth discussing.
Sure. I didn't mean that to come out in quite the dismissive way it did :). Apologies. Maybe a PEP aimed at Python 4.0 is in order. If you're open to the idea of just having some way to enable warnings, it could be much sooner.
I think that generating a warning (by default?) is a strong enough change in the right direction, so we should add that as another option. (Was also suggested in a comment on my blog.)
While checking against a==0.0 (and other similar conditions) before dividing will indeed protect from outright division by zero, it will enlarge any error you will have in the computation. I guess it would be better to do the same check for 'a is small' for appropriate values of 'small'.
Still, a check for 0.0 is good enough in some cases: if a is tiny, the large intermediate values may appear and then disappear happily before giving a sensible final result. These are usually the sort of cases where just having division by 0.0 return an infinity would have "just worked" too (making the whole "if" redundant), but that's not (currently!) an option in Python.
It's a truism that floating-point equality tests should be avoided, but it's just not true that floating-point equality testing is *always* wrong, and I don't think that Python should make it so.
Alright, that's why in my original suggestion, I proposed a function for 'old-style' comparison. It still seems to me that in most cases you are better off doing something other than using the current ==. A point I'm not sure of though, is what happens to other comparison operators, namely, <=, <, >, >=. If they retain their original meaning than <= and >= become at least a bit inconsistent. I'll be glad to hear more opinions about this.
Actually, one of the reasons I thought about this subject in the first place, was dict lookup for floating point numbers. It seems to me that it's something you just shouldn't do.
So your proposal would presumably include making
x in dict
and
x not in dict
errors for any float x, regardless of the contents of the dictionary (or list, or set, or frozenset, or...) dict?
What would you do about Decimals? A Decimal is just another floating point format (albeit base 10 instead of base 2); so presumably all these warnings/errors should apply equally to Decimal instances? If not, why not?
This last note gave me pause. I still need to think more about this, but here are my thoughts so far: 1. Decimal's behavior might be considered even more inconsistent - the precision applies to arithmetical operations, but not to comparisons. 2. As a result, it seems to me that decimal's behavior might also be changed. It needn't be the same change as regular floating point though - decimal behavior might follow suggestion 1, while regular floating points might follow suggestion 2. (I see no point in it being the other way around though.) 3. Usage in containers depending on __hash__ should change according to how == behaves for decimals. If == raises an a warning/exception, so should "x in {..}". If == will be changed to work according to precision for decimals, then usage in containers will be (very) problematic, because of context changes. (Consider what happens when changing the precision.) 4. Right now, I would avoid using decimal or regular floating points in such containers. The results are just not predictable enough. Using the 'ideal static-checker' mentioned above, I'd say that any such use should result in a warning. In any case, there might be a place for a way to do floating point comparisons in a 'standard' manner.
I'm not trying to be negative here---as Aahz says, this is an interesting idea; I'm just trying to understand exactly how things might work.
Mark
Sure, so do I. Cheers, Imri. ------------------------- Imri Goldberg www.algorithm.co.il/blogs www.imri.co.il ------------------------- Insert Signature Here -------------------------
Imri Goldberg wrote:
what happens to other comparison operators, namely, <=, <, >, >=. If they retain their original meaning than <= and >= become at least a bit inconsistent.
Also, if you have <= and >= then you can cheat by doing 'x <= y and x >= y'. :-) -- Greg
Greg Ewing wrote:
Imri Goldberg wrote:
what happens to other comparison operators, namely, <=, <, >, >=. If they retain their original meaning than <= and >= become at least a bit inconsistent.
Also, if you have <= and >= then you can cheat by doing 'x <= y and x >= y'. :-)
That's part of what I meant. There's also the problem that if x>y, then you want x!=y. This means that there are implications for all comparison operators. This makes changing == behavior to an epsilon comparison more involved. I still think it is feasible, but will require much more consideration. In any case, emitting a warning for == is still 'cheap', and the original arguments stand. ------------------------- Imri Goldberg www.algorithm.co.il/blogs www.imri.co.il ------------------------- Insert Signature Here -------------------------
On Thu, Mar 13, 2008 at 6:18 PM, Imri Goldberg <lorgandon@gmail.com> wrote:
This makes changing == behavior to an epsilon comparison more involved. I still think it is feasible, but will require much more consideration.
Okay, now I am going to be negative. :-) I really think that there's essentially zero chance of == and != ever changing to 'fuzzy' comparisons in Python. I don't want to discourage you from working out possible details as an academic exercise, or perhaps with some other (Python-like?) language in mind, but I just don't see it ever happening in Python. Maybe I'm wrong, in which case I hope other python people will tell me so, but I think pursuing this is, in the end, going to be a waste of time. Some reasons, and then I'll shut up: Too much complication and magic implicit stuff going on behind the scenes. In a fuzzy a == b there are hidden choices about the fuzziness scheme and the amount of fuzz to allow, and those choices are going to confuse the hell out of newbie and expert programmers alike. As above, you'd have to choose defaults for the fuzziness, and by Murphy's Law those defaults would be wrong for almost everybody else's particular applications, meaning that almost everybody else would have to go away and learn about how to change or turn off the fuzziness. Fundamental and well-understood laws (trichotomy, transitivity of equality) would break. It's really unclear how the other comparison operators would be affected. If 1.0 == 1.0+2e-16 returns True, shouldn't 1.0 >= 1.0+2e-16 also return True? Containers would be affected in peculiar ways. I think people would be really surprised to find that 1.0+2e-16 *was* an element of the set {1.0}, or that 1.0 and 1.0+2e-16 weren't allowed to be different keys in a dict. And how on earth do you check for set or dict membership under the hood? I don't know of any other language that has successfully done this, even though I've seen the idea floated many times for different languages. That doesn't mean much, since I only know a small handful of the many hundreds (thousands?) of languages out there. If you know a counterexample, I'd be interested to hear it. Mark
Mark Dickinson wrote:
On Thu, Mar 13, 2008 at 6:18 PM, Imri Goldberg <lorgandon@gmail.com <mailto:lorgandon@gmail.com>> wrote:
This makes changing == behavior to an epsilon comparison more involved. I still think it is feasible, but will require much more consideration.
Okay, now I am going to be negative. :-)
I really think that there's essentially zero chance of == and != ever changing to 'fuzzy' comparisons in Python. I don't want to discourage you from working out possible details as an academic exercise, or perhaps with some other (Python-like?) language in mind, but I just don't see it ever happening in Python. Maybe I'm wrong, in which case I hope other python people will tell me so, but I think pursuing this is, in the end, going to be a waste of time.
Alright, I agree it's a good idea to drop the proposal to changing floating point == into an epsilon compare. What about issuing a warning though? Consider the following course of action. It is the one with the least changes: == for regular floating point numbers now issues a warning, but still works. This warning might be turned off. All other operators are left unchanged. Do you think this should be dropped as well? Just for my own code, I think I'd like this behavior. I still consider floating point == a potential bug, and this helps me catch it, in the absence of the 'ideal static checker'.
Containers would be affected in peculiar ways. I think people would be really surprised to find that 1.0+2e-16 *was* an element of the set {1.0}, or that 1.0 and 1.0+2e-16 weren't allowed to be different keys in a dict. And how on earth do you check for set or dict membership under the hood?
I think that right now containers behave in peculiar ways when used with FP numbers. Take set for example - you might as well just use list instead of it. When you consider dict, then doing d[x] might not return the result you actually want.
I don't know of any other language that has successfully done this, even though I've seen the idea floated many times for different languages. That doesn't mean much, since I only know a small handful of the many hundreds (thousands?) of languages out there. If you know a counterexample, I'd be interested to hear it.
Mark Don't know of a good counterexample. I agree that before changing the behavior of == to fuzzy comparison, you'll want experience with that kind of change.
Cheers, Imri ------------------------- Imri Goldberg www.algorithm.co.il/blogs www.imri.co.il ------------------------- Insert Signature Here -------------------------
On Fri, Mar 14, 2008 at 5:01 AM, Imri Goldberg <lorgandon@gmail.com> wrote:
Alright, I agree it's a good idea to drop the proposal to changing floating point == into an epsilon compare. What about issuing a warning though? Consider the following course of action. It is the one with the least changes:
== for regular floating point numbers now issues a warning, but still works. This warning might be turned off. All other operators are left unchanged.
Do you think this should be dropped as well?
To be honest, yes. There isn't currently a SmellyCodeWarning or IsThatReallyWhatYouMeanWarning in Python, and there doesn't seem to be a lot of precedent for warning on code constructs that may often be wrong but also have legitimate uses. Most of the current warnings have more to do with syntactic or semantic changes between various versions of Python. But I think it would be entirely appropriate to warn about floating-point (in)equality checks in something like PyChecker or Pylint, if you can get past the technical difficulties of detecting floating-point comparisons statically. Mark
I've given it more thought over the past few days. Given the discussion here, and some more reading on my part, it seems to me that there isn't much chance for me convincing anyone to raise an exception on FP ==. I'm not too sure that it's the right move anyway. While I'll probably avoid FP == in my code, it seems to me that there are some cases it is useful (even given the inaccuracy of the results). Regarding adding warnings to pychecker/pylint, I think it's a good idea. Probably for another mailing list though :). Also, I considered the subject of runtime warnings as well. Adding the relevant warnings to any static checker could be really hard work while warning during runtime could be a lot easier. Therefore, it seems worthwhile to consider this option. I didn't happen to use the warnings module before, so I read its documentation now (also the PEP) and played with it a little. First, if a warning is generated for floating point ==, it can be turned off globally, or on a line-by-line basis. Second, regarding Mark's comment on SmellyCodeWarning. I thought about it a bit, and it seems no joke to me. gcc has a -Wall mode, so does Python. Why not use it in this situation? (i.e. having some warnings not displayed by default.) I think it would be interesting to consider more cases of 'SmellyCodeWarning' in general, and adding them under some warning category. If there's a need for a use case, we've already got the first one - floating point comparisons. Cheers, Imri. ------------------------- Imri Goldberg www.algorithm.co.il/blogs www.imri.co.il ------------------------- Insert Signature Here ------------------------- Mark Dickinson wrote:
On Fri, Mar 14, 2008 at 5:01 AM, Imri Goldberg <lorgandon@gmail.com <mailto:lorgandon@gmail.com>> wrote:
Alright, I agree it's a good idea to drop the proposal to changing floating point == into an epsilon compare. What about issuing a warning though? Consider the following course of action. It is the one with the least changes:
== for regular floating point numbers now issues a warning, but still works. This warning might be turned off. All other operators are left unchanged.
Do you think this should be dropped as well?
To be honest, yes. There isn't currently a SmellyCodeWarning or IsThatReallyWhatYouMeanWarning in Python, and there doesn't seem to be a lot of precedent for warning on code constructs that may often be wrong but also have legitimate uses. Most of the current warnings have more to do with syntactic or semantic changes between various versions of Python.
But I think it would be entirely appropriate to warn about floating-point (in)equality checks in something like PyChecker or Pylint, if you can get past the technical difficulties of detecting floating-point comparisons statically.
Mark
Imri Goldberg wrote:
== for regular floating point numbers now issues a warning, but still works. This warning might be turned off.
I think I would find it annoying to have to disable a warning whenever I legitimately wanted to do a floating ==. Also, having a global warning/no warning setting for the whole program isn't really right -- whether a floating == is legitimate is something that needs to be decided on a case-by-case basis. -- Greg
On 3/14/08, Imri Goldberg <lorgandon@gmail.com> wrote:
Alright, I agree it's a good idea to drop the proposal to changing floating point == into an epsilon compare. What about issuing a warning though? Consider the following course of action. It is the one with the least changes:
== for regular floating point numbers now issues a warning, but still works. This warning might be turned off. All other operators are left unchanged.
If you change ==, you should really change !=, and probably the other comparisons as well. I suspect what you really want is a warning on any usage of a floating point. And I'm only half-joking. Comparison (or arithmetic) with other floats adds error. Comparison (or arithmetic) with ints is *usually* a bug (unless one of the operands is a constant that someone was too lazy to write correctly). -jJ
Jim Jewett wrote:
Comparison (or arithmetic) with ints is *usually* a bug (unless one of the operands is a constant that someone was too lazy to write correctly).
That depends on what you regard as "correct". Python generally permits a duck-typed approach to numbers wherein using integers as a subset of floats is considered legitimate, and not lazy at all. -- Greg
On 3/13/08, Mark Dickinson <dickinsm@gmail.com> wrote:
On Thu, Mar 13, 2008 at 6:18 PM, Imri Goldberg <lorgandon@gmail.com> wrote:
I really think that there's essentially zero chance of == and != ever changing to 'fuzzy' comparisons in Python.
They sort of already did -- you can define __eq__ and __ne__ on your own class in bizarre and inconsistent ways. [Though I think you can't easily override that (x is y) ==> (x==y).] You can even do this with your own float-alike class. What you're really asking for is that the float class take advantage of this.
I don't know of any other language that has successfully done this, ...
Changing an existing class requires that the class be "open". That is the default in languages like smalltalk or ruby. It is even the default for python classes -- but it is certainly not the default for "python" classes that are actually coded in C -- which includes floats. -jJ
On Tue, Mar 18, 2008 at 6:33 PM, Jim Jewett <jimjjewett@gmail.com> wrote:
They sort of already did -- you can define __eq__ and __ne__ on your own class in bizarre and inconsistent ways. [Though I think you can't easily override that (x is y) ==> (x==y).]
Why not? I get this with Python 2.5.1:
from decimal import * Decimal.__eq__ = lambda x, y: False x = Decimal(2) x == x False x is x True
Or am I misunderstanding your meaning? <unnecessary pendantry> Of course, even for floats it's not true that x is y implies x == y:
x = float('nan') x is x True x == x False
</unnecessary pedantry>
Changing an existing class requires that the class be "open". That is the default in languages like smalltalk or ruby. It is even the default for python classes -- but it is certainly not the default for "python" classes that are actually coded in C -- which includes floats.
You mean like:
float.__eq__ = lambda x, y: False Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: can't set attributes of built-in/extension type 'float'
? Presumably there are good reasons for this restriction (performance? convenience? lack of round tuits?), but I've no idea what they are. I can't say that I've ever felt a need to do anything like this. Mark
participants (5)
-
Bill Janssen
-
Greg Ewing
-
Imri Goldberg
-
Jim Jewett
-
Mark Dickinson