[Python-Dev] nice()
Raymond Hettinger
raymond.hettinger at verizon.net
Wed Feb 15 13:45:32 CET 2006
[Smith]
> The following discussion ends with things that python-dev might want to
> consider in terms of adding a function that allows something other than the
> default 12- and 17-digit precision representations of numbers that str() and
> repr() give. Such a function (like nice(), perhaps named trim()?) would
> provide a way to convert fp numbers that are being used in comparisons into a
> precision that reflects the user's preference.
-1 See posts by Greg, Terry, and myself which recommend against trim(), nice(),
or other variants. For the purpose of precision sensitive comparisons, these
constructs are unfit for their intended purpose -- they are error-prone and do
not belong in Python. They may have some legitimate uses, but those tend to be
dominated by the existing round() function.
If anything, then some variant of is_close() can go in the math module. BUT,
the justification should not be for newbies to ignore issues with floating-point
equality comparisons. The justification would have to be that folks with some
numerical sophistication have a recurring need for the function (with
sophistication meaning that they know how to come up with relative and absolute
tolerances that make their application succeed over the full domain of possible
inputs).
Raymond
---- relevant posts from Greg and Terry ----
[Greg Ewing]
>> I don't think you're doing anyone any favours by trying to protect
>> them from having to know about these things, because they *need* to
>> know about them if they're not to write algorithms that seem to
>> work fine on tests but mysteriously start producing garbage when
>> run on real data,
[Terry Reedy]
> I agree. Here was my 'kick-in-the-butt' lesson (from 20+ years ago): the
> 'simplified for computation' formula for standard deviation, found in too
> many statistics books without a warning as to its danger, and specialized
> for three data points, is sqrt( ((a*a+b*b+c*c)-(a+b+c)**2/3.0) /2.0).
> After 1000s of ok calculations, the data were something like a,b,c =
> 10005,10006,10007. The correct answer is 1.0 but with numbers rounded to 7
> digits, the computed answer is sqrt(-.5) == CRASH. I was aware that
> subtraction lost precision but not how rounding could make a theoretically
> guaranteed non-negative difference negative.
>
> Of course, Python floats being C doubles makes such glitches much rarer.
> Not exposing C floats is a major newbie (and journeyman) protection
> feature.
[Greg Ewing]
> I don't think you're doing anyone any favours by trying to protect
> them from having to know about these things, because they *need* to
> know about them if they're not to write algorithms that seem to
> work fine on tests but mysteriously start producing garbage when
> run on real data,
I recommend rejecting trim(), nice(), areclose(), and all variants.
Greg, Terry, and myself have
>
> OK, comments responding to Greg follow.
>
>
> | From: Greg Ewing greg.ewing at canterbury.ac.nz
> | Smith wrote:
> |
> || computing the bin boundaries for a histogram
> || where bins are a width of 0.1:
> ||
> ||||| for i in range(20):
> || ... if (i*.1==i/10.)<>(nice(i*.1)==nice(i/10.)):
> || ... print i,repr(i*.1),repr(i/10.),i*.1,i/10.
> |
> | I don't see how that has any relevance to the way bin boundaries
> | would be used in practice, which is to say something like
> |
> | i = int(value / 0.1)
> | bin[i] += 1 # modulo appropriate range checks
>
> This is just masking the issue by converting numbers to integers. The fact
> remains that two mathematically equal numbers can have two different internal
> representations with one being slightly larger than the exact integer value
> and one smaller:
>
>>>> a=(23*.1)*10;a
> 23.000000000000004
>>>> b=2.3/.1;b
> 22.999999999999996
>>>> int(a/.1),int(b/.1)
> (230, 229)
>
> Part of the answer in this context is to use round() rather than int so you
> are getting to the closest integer.
>
>
> || For, say, garden variety numbers that aren't full of garbage digits
> || resulting from fp computation, the boundaries computed as 0.1*i are\
> || not going to agree with such simple numbers as 1.4 and 0.7.
> |
> | Because the arithmetic is binary rather than decimal. But even using
> | decimal, you get the same sort of problems using a bin width of
> | 1.0/3.0. The solution is to use an algorithm that isn't sensitive
> | to those problems, then it doesn't matter what base your arithmetic
> | is done in.
>
> Agreed.
>
> |
> || I understand that the above really is just a patch over the problem,
> || but I'm wondering if it moves the problem far enough away that most
> || users wouldn't have to worry about it.
> |
> | No, it doesn't. The problems are not conveniently grouped together
> | in some place you can get away from; they're scattered all over the
> | place where you can stumble upon one at any time.
> |
>
> Yes, even a simple computation of the wrong type can lead to unexpected
> results. I agree.
>
> || So perhaps this brings us back to the original comment that "fp
> || issues are a learning opportunity." They are. The question I have is
> || "how
> || soon do they need to run into them?" Is decreasing the likelihood
> || that they will see the problem (but not eliminate it) a good thing
> || for the python community or not?
> |
> | I don't think you're doing anyone any favours by trying to protect
> | them from having to know about these things, because they *need* to
> | know about them if they're not to write algorithms that seem to
> | work fine on tests but mysteriously start producing garbage when
> | run on real data, possibly without it even being obvious that it is
> | garbage.
>
> Mostly I agree, but if you go to the extreme then why don't we just drop
> floating point comparisons altogether and force the programmer to convert
> everything to integers and make their own bias evident (like converting to int
> rather than nearest int). Or we drop the fp comparison operators and introduce
> fp comparison functions that require the use of tolerance terms to again make
> the assumptions transparent:
>
> def lt(x, y, rel_err = 1e-5, abs_err = 1e-8):
> return not areclose(x,y,rel_err,abs_err) and int(x-y)<=0
> print lt(a,b,0,1e-10) --> False (they are equal to that tolerance)
> print lt(a,b,0,1e-20) --> True (a is less than b at that tolerance)
>
> The fact is, we make things easier and let the programmer shoot themselves in
> the foot if they want to by providing things like fp comparisons and even
> functions like sum that do dumb-sums (though Raymond Hettinger's Python Recipe
> at ASPN provides a smart-sum).
>
> I think the biggest argument for something like nice() is that it fills the
> void for a simple way to round numbers to a relative error rather than an
> absolute error. round() handles absolute error--it rounds to a given
> precision. str() rounds to the 12th digit and repr() to the 17th digit. There
> is nothing else except build-your-own solutions to rounding to an arbitrary
> significant figure. nice() would fill that niche and provide the default 12
> significant digit solution.
>
> I agree that making all float comparisions default to 12-digit precision would
> not be smart. That would be throwing away 5 digits that someone might really
> want. Providing a simple way to specify the desired significance is something
> that is needed, especially since fp issues are such a thorny issue. The user
> that explicitly uses nice(x)<nice(y) is being rewarded at the moment by
> getting a result that they expect, e.g.
>
> nice(2.3/.1)==nice((23*.1)*10)
>
> and also getting a subtle reminder that their result is only true at the
> default (12th digit) precision level.
>
> /c
> _______________________________________________
> Python-Dev mailing list
> Python-Dev at python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/python%40rcn.com
More information about the Python-Dev
mailing list