[Python-bugs-list] [ python-Bugs-513866 ] Float/long comparison anomaly

Sun, 17 Feb 2002 08:22:10 -0800

Bugs item #513866, was opened at 2002-02-06 10:33
You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=513866&group_id=5470

Category: Python Interpreter Core
Group: None
Status: Open
Resolution: Later
Priority: 5
Submitted By: Andrew Koenig (arkoenig)
Assigned to: Nobody/Anonymous (nobody)
Summary: Float/long comparison anomaly

Initial Comment:
Comparing a float and a long appears to convert the 
long to float and then compare the two floats.  This 
strategy is a problem because the conversion might 
lose precision.  As a result, == is not an equivalence 
relation and < is not an order relation.  For example, 
it is possible to create three numbers a, b, and c 
such that a==b, b==c, and a!=c.

----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2002-02-17 08:22

Message:
Logged In: YES 
user_id=72053

It looks like the complication is not in finding an
algorithm but rather in fitting it into the implementation.
 I'm not at all sure this is right, but glancing at the
code, the comparison seems to happen in the function
try_3way_compare in Objects/object.c, which calls
PyNumber_CoerceEx if the types aren't equal. 
PyNumber_CoerceEx ends up calling float_coerce on x,n which
"promotes" n to float, similar to what happens when you do
mixed arithmetic (like x+n).  My guess is that a suitable
patch would go into try_3way_compare to specially notice
when you're comparing a float and a long, and avoid the
coercion.  I'm unfamiliar enough with the implementation
that I'd probably take a while to get it right, and still
possibly end up forgetting to update a refcount or
something, leading to leaked memory or mysterious crashes
later.  

Anyway, no, this isn't real important to me, at least at the
moment.  It just wasn't clear whether there was any
difficulty figuring out a useable algorithm.

----------------------------------------------------------------------

Comment By: Andrew Koenig (arkoenig)
Date: 2002-02-17 07:46

Message:
Logged In: YES 
user_id=418174

I think there's a slightly more straightforward algorithm
than the one that Paul Rubin (phr) suggested.

Again, assume that x is a float and n is a long.

We note first that the comparison is trivial unless x and n
are both nonzero and have the same sign.  We will therefore
assume in the rest of this discussion that x and n are
strictly positive; the case where they are negative is
analogous.

Every floating-point implementation has many numbers with
the property that the least significant bit in those
numbers' representations has a value of 1.  In general, if
the floating-point representation has k bits, then any
integer in the range [2**(k-1),2**k) qualifies.  Let K be
any of these numbers; it doesn't matter which one.

Precompute K and store it in both float and long form.  This
computation is exact because K is an integer that has an
exact representation in floating-point form.  It is now
possible to compare x with K and n with K exactly, without
conversion, because we already have K exactly in both forms.

If x < K and n >= K, then x < n and we're done.
If x > K and n <= K, then x > n and we're done.

Otherwise, x and n are on the same side of K (possibly being
equal to K).

If x >= K and n >= K, then the LSB of x is at least 1, so we
can convert x to long without losing information. 
Therefore, cmp(x, n) is cmp(long(x), n).

If x <= K and n <= K, then then n is small enough that it
has an exact representation as a float.  Therefore cmp(x, n)
is cmp(x, float(n)).

So I don't think there's any profound algorithmic problem
here.  Unfortunately, I don't know enough about the details
of how comparison is implemented to be willing to try my
hand at a patch.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-17 07:31

Message:
Logged In: YES 
user_id=31435

Paul, this isn't an intellectual challenge -- I expect any 
numerical programmer of ordinary skill could write code to 
compare a float to a long delivering the mathematically 
sensible result.  There are several ways to do it.  
Adding "me too" votes doesn't change the priority.  How 
about taking a whack at writing a patch if this is 
important to you?  It's so low on the list of PythonLabs 
priorities I doubt I'll ever get to it (which is why I 
unassigned myself:  an unassigned bug report is looking for 
someone to fix it, not a cheerleader <wink>).

----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2002-02-17 06:58

Message:
Logged In: YES 
user_id=72053

Oops, I got confused about the order of the two args in the
example below.  I meant cmp(x,n) in the description and
cmp(xl, n) in the code, rather than having n first.  Anyway
you get the idea.  Now I should go back to bed ;-).

----------------------------------------------------------------------

Comment By: paul rubin (phr)
Date: 2002-02-17 06:43

Message:
Logged In: YES 
user_id=72053

I hope there's a simple solution to this--it's obvious what
the right result should be mathematically if you compare
1L<<10000 with 0.0.  It should not raise an error.  If the
documented behavior leads to raising an error, then there's
a bug in the document.  I agree that it's not the highest
priority bug in the world, but it doesn't seem that complicated.

If n is a long and x is a float, both >= 0, what happens if
you do this, to implement cmp(n,x):

   xl = long(x)
   # if x has a fraction part and int part is == n, then x>n
   if float(xl)!=x and xl==n: return 1
   return cmp(n, xl)

If both are < 0, change 1 to -1 above.  If x and n are of
opposite sign, the positive one is greater.

Unless I missed something (which is possible--I'm not too
alert right now) the above should be ok in all cases.

Basically you use long as the common type to convert to; you
do lose information when converting a non-integer, but for
the comparison with an integer, you don't need the lost
information other than knowing whether it was nonzero, which
you find out by converting the long back to a float.

----------------------------------------------------------------------

Comment By: Andrew Koenig (arkoenig)
Date: 2002-02-09 07:42

Message:
Logged In: YES 
user_id=418174

I completely agree it's not a high-priority item, 
especially because it may be complicated to fix.

I think that the fundamental problem is that there is no 
common type to which both float and long can be converted 
without losing information, which complicates both the 
definition and implementation of comparison.  Accordingly, 
it might make sense to think about this issue in 
conjunction with future consideration of rational numbers.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-08 23:33

Message:
Logged In: YES 
user_id=31435

I reopened this, but unassigned it since I can't justify 
working on it (the benefit/cost ratio of fixing it is down 
in the noise compared to other things that should be done).

I no longer think we'd need a PEP to change the behavior, 
and agree it would be nice to change it.  Changing it may 
surprise people expecting Python to work like C (C99 says 
that when integral -> floating conversion is in range but 
can't be done exactly, either of the closest representable 
floating numbers may be returned; Python inherits the 
platform C's behavior here for Python int -> Python float 
conversion (C long -> C double); when the conversion is out 
of range, C doesn't define what happens, and Python 
inherits that too before 2.2 (Infinities and NaNs are what 
I've seen most often, varying by platform); in 2.2 it 
raises OverflowError).

I'm not sure it's possible for a<b and b<c and a==c, unless 
the platform C is inconsistent (meaning that (double)i for 
a fixed i returns the next-lowest double on some calls but 
the next-higher on others).  This brute-force searcher 
didn't turn up any examples on my box:

f = 2L**53 - 5 # straddle the representable-as-double limit

nums = [f+i for i in range(50)]
nums.extend(map(float, nums))

for a in nums:
.   for b in nums:
.       if not a < b:
.           continue
.       for c in nums:
.           if not b < c:
.               continue
.           if a >= c:
.               print `a`, `b`, `c`

----------------------------------------------------------------------

Comment By: Andrew Koenig (arkoenig)
Date: 2002-02-07 07:33

Message:
Logged In: YES 
user_id=418174

Here is yet another surprise:

x=[1L<10000]
y=[0.0]
z=x+y

Now I can execute x.sort() and y.sort() successfully, but 
z.sort blows up.

----------------------------------------------------------------------

Comment By: Andrew Koenig (arkoenig)
Date: 2002-02-07 05:28

Message:
Logged In: YES 
user_id=418174

The difficulty is that as defined, < is not an order 
relation, because there exist values a, b, c such that a<b, 
b==c, and a==c.  I believe that there also exist values 
such that a<b, b<c, and a==c.  Under such circumstances, it 
is hard to understand how sort can work properly, whicn is 
my real concern.  Do you really want to warn people that 
they shouldn't sort lists containing floats and longs?

Moreover, it is not terribly difficult to define the 
comparisons so that == is an equivalence relation and < is 
an order relation.  The idea is that for any floating-point 
system, there is a threshold T such that if x is a float 
value >=T, converting x to long will not lose information, 
and if x is a long value <=T, converting x to float will 
not lose information.  Therefore, instead of always 
converting to long, it suffices to convert in a direction 
chosen by comparing the operands to T (without conversion) 
first.

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-06 20:59

Message:
Logged In: YES 
user_id=31435

Oops!  I meant

"""
could lead to a different result than the explicit coercion 
in

somefloat == float(somelong)
"""

----------------------------------------------------------------------

Comment By: Tim Peters (tim_one)
Date: 2002-02-06 20:52

Message:
Logged In: YES 
user_id=31435

Since the coercion to float is documented and intended, 
it's not "a bug" (it's functioning as designed), although 
you may wish to argue for a different design, in which case 
making an incompatible change would first require a PEP and 
community debate.  Information loss in operations involving 
floats comes with the territory, and I don't see a reason 
to single this particular case out as especially 
surprising.  OTOH, I expect it would be especially 
surprising to a majority of users if the implicit coercion 
in

somefloat == somelong

could lead to a different result than the explicit coercion 
in

long(somefloat) == somelong

Note that the "long" type isn't unique here:  the same is 
true of mixing Python ints with Python floats on boxes 
where C longs have more bits of precision than C doubles 
(e.g., Linux for IA64, and Crays).

----------------------------------------------------------------------

You can respond by visiting: 
http://sourceforge.net/tracker/?func=detail&atid=105470&aid=513866&group_id=5470