[Python-Dev] nice()

Wed Feb 15 12:54:44 CET 2006

I am reluctantly posting here since this is of less intense interest than other things being discussed right now, but this is related to the areclose proposal that was discussed here recently. 

The following discussion ends with things that python-dev might want to consider in terms of adding a function that allows something other than the default 12- and 17-digit precision representations of numbers that str() and repr() give. Such a function (like nice(), perhaps named trim()?) would provide a way to convert fp numbers that are being used in comparisons into a precision that reflects the user's preference.  

Everyone knows that fp numbers must be compared with caution, but there is a void in the relative-error department for exercising such caution, thus the proposal for something like 'areclose'. The problem with areclose(), however, is that it only solves one part of the problem that needs to be solved if two fp's *are* going to be compared: if you are going to check if a < b you would need to do something like 

    not areclose(a,b) and a < b

With something like trim() (a.k.a nice()) you could do

    trim(a) < trim(b)

to get the comparison to 12-digit default precision or arbitrary precision with optional arguments, e.g. to 3 digits of precision:

    trim(a,3) < trim(b,3)

>From a search on the documentation, I don't see that the name trim() is taken yet.

OK, comments responding to Greg follow.

| From: Greg Ewing greg.ewing at canterbury.ac.nz
| Smith wrote:
| 
|| computing the bin boundaries for a histogram
|| where bins are a width of 0.1:
|| 
||||| for i in range(20):
|| ...  if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
|| ...   print i,repr(i*.1),repr(i/10.),i*.1,i/10.
| 
| I don't see how that has any relevance to the way bin boundaries
| would be used in practice, which is to say something like
| 
|   i = int(value / 0.1)
|   bin[i] += 1 # modulo appropriate range checks

This is just masking the issue by converting numbers to integers. The fact remains that two mathematically equal numbers can have two different internal representations with one being slightly larger than the exact integer value and one smaller:

>>> a=(23*.1)*10;a
23.000000000000004
>>> b=2.3/.1;b
22.999999999999996
>>> int(a/.1),int(b/.1)
(230, 229)

Part of the answer in this context is to use round() rather than int so you are getting to the closest integer.

|| For, say, garden variety numbers that aren't full of garbage digits
|| resulting from fp computation, the boundaries computed as 0.1*i are\
|| not going to agree with such simple numbers as 1.4 and 0.7.
| 
| Because the arithmetic is binary rather than decimal. But even using
| decimal, you get the same sort of problems using a bin width of
| 1.0/3.0. The solution is to use an algorithm that isn't sensitive
| to those problems, then it doesn't matter what base your arithmetic
| is done in.

Agreed.

| 
|| I understand that the above really is just a patch over the problem,
|| but I'm wondering if it moves the problem far enough away that most
|| users wouldn't have to worry about it.
| 
| No, it doesn't. The problems are not conveniently grouped together
| in some place you can get away from; they're scattered all over the
| place where you can stumble upon one at any time.
| 

Yes, even a simple computation of the wrong type can lead to unexpected results. I agree.

|| So perhaps this brings us back to the original comment that "fp
|| issues are a learning opportunity." They are. The question I have is
|| "how 
|| soon  do they need to run into them?" Is decreasing the likelihood
|| that they will see the problem (but not eliminate it) a good thing
|| for the python community or not?
| 
| I don't think you're doing anyone any favours by trying to protect
| them from having to know about these things, because they *need* to
| know about them if they're not to write algorithms that seem to
| work fine on tests but mysteriously start producing garbage when
| run on real data, possibly without it even being obvious that it is
| garbage.

Mostly I agree, but if you go to the extreme then why don't we just drop floating point comparisons altogether and force the programmer to convert everything to integers and make their own bias evident (like converting to int rather than nearest int). Or we drop the fp comparison operators and introduce fp comparison functions that require the use of tolerance terms to again make the assumptions transparent: 

def lt(x, y, rel_err = 1e-5, abs_err = 1e-8):
    return not areclose(x,y,rel_err,abs_err) and int(x-y)<=0
print lt(a,b,0,1e-10) --> False (they are equal to that tolerance)
print lt(a,b,0,1e-20) --> True (a is less than b at that tolerance)

The fact is, we make things easier and let the programmer shoot themselves in the foot if they want to by providing things like fp comparisons and even functions like sum that do dumb-sums (though Raymond Hettinger's Python Recipe at ASPN provides a smart-sum).

I think the biggest argument for something like nice() is that it fills the void for a simple way to round numbers to a relative error rather than an absolute error. round() handles absolute error--it rounds to a given precision. str() rounds to the 12th digit and repr() to the 17th digit. There is nothing else except build-your-own solutions to rounding to an arbitrary significant figure. nice() would fill that niche and provide the default 12 significant digit solution. 

I agree that making all float comparisions default to 12-digit precision would not be smart. That would be throwing away 5 digits that someone might really want. Providing a simple way to specify the desired significance is something that is needed, especially since fp issues are such a thorny issue. The user that explicitly uses nice(x)<nice(y) is being rewarded at the moment by getting a result that they expect, e.g. 

    nice(2.3/.1)==nice((23*.1)*10)

and also getting a subtle reminder that their result is only true at the default (12th digit) precision level.

/c