[PythonDev] nice()
Smith
smiles at worksmail.net
Wed Feb 15 12:54:44 CET 2006
I am reluctantly posting here since this is of less intense interest than other things being discussed right now, but this is related to the areclose proposal that was discussed here recently.
The following discussion ends with things that pythondev might want to consider in terms of adding a function that allows something other than the default 12 and 17digit precision representations of numbers that str() and repr() give. Such a function (like nice(), perhaps named trim()?) would provide a way to convert fp numbers that are being used in comparisons into a precision that reflects the user's preference.
Everyone knows that fp numbers must be compared with caution, but there is a void in the relativeerror department for exercising such caution, thus the proposal for something like 'areclose'. The problem with areclose(), however, is that it only solves one part of the problem that needs to be solved if two fp's *are* going to be compared: if you are going to check if a < b you would need to do something like
not areclose(a,b) and a < b
With something like trim() (a.k.a nice()) you could do
trim(a) < trim(b)
to get the comparison to 12digit default precision or arbitrary precision with optional arguments, e.g. to 3 digits of precision:
trim(a,3) < trim(b,3)
>From a search on the documentation, I don't see that the name trim() is taken yet.
OK, comments responding to Greg follow.
 From: Greg Ewing greg.ewing at canterbury.ac.nz
 Smith wrote:

 computing the bin boundaries for a histogram
 where bins are a width of 0.1:

 for i in range(20):
 ... if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
 ... print i,repr(i*.1),repr(i/10.),i*.1,i/10.

 I don't see how that has any relevance to the way bin boundaries
 would be used in practice, which is to say something like

 i = int(value / 0.1)
 bin[i] += 1 # modulo appropriate range checks
This is just masking the issue by converting numbers to integers. The fact remains that two mathematically equal numbers can have two different internal representations with one being slightly larger than the exact integer value and one smaller:
>>> a=(23*.1)*10;a
23.000000000000004
>>> b=2.3/.1;b
22.999999999999996
>>> int(a/.1),int(b/.1)
(230, 229)
Part of the answer in this context is to use round() rather than int so you are getting to the closest integer.
 For, say, garden variety numbers that aren't full of garbage digits
 resulting from fp computation, the boundaries computed as 0.1*i are\
 not going to agree with such simple numbers as 1.4 and 0.7.

 Because the arithmetic is binary rather than decimal. But even using
 decimal, you get the same sort of problems using a bin width of
 1.0/3.0. The solution is to use an algorithm that isn't sensitive
 to those problems, then it doesn't matter what base your arithmetic
 is done in.
Agreed.

 I understand that the above really is just a patch over the problem,
 but I'm wondering if it moves the problem far enough away that most
 users wouldn't have to worry about it.

 No, it doesn't. The problems are not conveniently grouped together
 in some place you can get away from; they're scattered all over the
 place where you can stumble upon one at any time.

Yes, even a simple computation of the wrong type can lead to unexpected results. I agree.
 So perhaps this brings us back to the original comment that "fp
 issues are a learning opportunity." They are. The question I have is
 "how
 soon do they need to run into them?" Is decreasing the likelihood
 that they will see the problem (but not eliminate it) a good thing
 for the python community or not?

 I don't think you're doing anyone any favours by trying to protect
 them from having to know about these things, because they *need* to
 know about them if they're not to write algorithms that seem to
 work fine on tests but mysteriously start producing garbage when
 run on real data, possibly without it even being obvious that it is
 garbage.
Mostly I agree, but if you go to the extreme then why don't we just drop floating point comparisons altogether and force the programmer to convert everything to integers and make their own bias evident (like converting to int rather than nearest int). Or we drop the fp comparison operators and introduce fp comparison functions that require the use of tolerance terms to again make the assumptions transparent:
def lt(x, y, rel_err = 1e5, abs_err = 1e8):
return not areclose(x,y,rel_err,abs_err) and int(xy)<=0
print lt(a,b,0,1e10) > False (they are equal to that tolerance)
print lt(a,b,0,1e20) > True (a is less than b at that tolerance)
The fact is, we make things easier and let the programmer shoot themselves in the foot if they want to by providing things like fp comparisons and even functions like sum that do dumbsums (though Raymond Hettinger's Python Recipe at ASPN provides a smartsum).
I think the biggest argument for something like nice() is that it fills the void for a simple way to round numbers to a relative error rather than an absolute error. round() handles absolute errorit rounds to a given precision. str() rounds to the 12th digit and repr() to the 17th digit. There is nothing else except buildyourown solutions to rounding to an arbitrary significant figure. nice() would fill that niche and provide the default 12 significant digit solution.
I agree that making all float comparisions default to 12digit precision would not be smart. That would be throwing away 5 digits that someone might really want. Providing a simple way to specify the desired significance is something that is needed, especially since fp issues are such a thorny issue. The user that explicitly uses nice(x)<nice(y) is being rewarded at the moment by getting a result that they expect, e.g.
nice(2.3/.1)==nice((23*.1)*10)
and also getting a subtle reminder that their result is only true at the default (12th digit) precision level.
/c
More information about the PythonDev
mailing list