[Python-ideas] PEP 485: A Function for testing approximate equality

Steven D'Aprano steve at pearwood.info
Fri Feb 6 05:48:19 CET 2015


On Thu, Feb 05, 2015 at 05:12:32PM -0800, Chris Barker wrote:
> On Thu, Feb 5, 2015 at 4:44 PM, Steven D'Aprano <steve at pearwood.info> wrote:
> 
> > > 0.0 < rel_tol < 1.0)
> >
> > I can just about see the point in restricting rel_tol to the closed
> > interval 0...1, but not the open interval. Mathematically, setting the
> > tolerance to 0 should just degrade gracefully to exact equality,
> 
> sure -- no harm done there.
> 
> > and a tolerance of 1 is nothing special at all.
> 
> well, I ended up putting that in because it turns out with the "weak" test,
> then anything compares as "close" to zero:

Okay. Maybe that means that the "weak test" is not useful if one of the 
numbers is zero. Neither is a relative tolerance, or an ULP calculation.


> tol>=1.0
> a = anything
> b = 0.0
> min( abs(a), abs(b) ) = 0.0
> abs(a-b) = a

That is incorrect. abs(a-b) for b == 0 is abs(a), not a.

> tol * a >= a
> abs(a-b) <= tol * a

If and only if a >= 0. Makes perfect mathematical sense, even if it's 
not useful. That's an argument for doing what Bruce Dawson says, and 
comparing against the maximum of a and b, not the minimum.


> Granted, that's actually the best argument yet for using the strong test --
> which I am suggesting, though I haven't thought out what that will do in
> the case of large tolerances.

It should work exactly the same as for small tolerances, except larger 
*wink*


> > Values larger than 1 aren't often useful, but there really is no reason
> > to exclude tolerances larger than 1. "Give or take 300%" (ie.
> > rel_tol=3.0) is a pretty big tolerance, but it is well-defined: a
> > difference of 299% is "close enough", 301% is "too far".
> >
> 
> yes it is, but then the whole weak vs string vs asymmetric test becomes
> important. 

Um, yes? I know Guido keeps saying that the difference is unimportant, 
but I think he is wrong: at the edges, the way you determine "close to" 
makes a difference whether a and b are considered close or not. If you 
care enough to specify a specific tolerance (say, 2.3e-4), as opposed to 
plucking a round number out of thin air, then you care about the edge 
cases. I'm not entirely sure what to do about it, but my sense is that 
we should do something.


> From my math the "delta" between the weak and strong tests goes
> with  tolerance**2 * max(a,b).  So if the tolerance is >=1, then it makes a
> big difference which test you choose. IN fact:
> 
> Is a within 300% of b makes sense, but "are a and b within 300% of
> each-other" is poorly defined.

No more so that "a and b within 1% of each other". It's just a 
short-hand. What I mean by "of each other" is the method recommended by 
Bruce Dawson, use the larger of a and b, what Boost(?) and you are 
calling the "strong test".


[...]
> > Negative error tolerances, on the other hand, do seem to be meaningless
> > and should be prevented.
> 
> 
> you could just take the abs(rel_tol), but really?  what's the point?

No no, I agree with you that tolerances (relative or absolute) should 
prohibit negative values. Or complex ones for that matter.


> > (E.g. "guess the number of grains of sand on this beach".) Any upper
> > limit you put in is completely arbitrary,
> 
> 
> somehow one doesn't feel arbitrary to me -- numbers aren't close if the
> difference between them is larger than the largest of the numbers -- not
> arbitrary, maybe unneccesary , but not arbirtrary

Consider one way of detecting outliers in numeric data: any number more 
than X standard deviations from the mean in either direction may be an 
outlier.

py> import statistics
py> data = [1, 2, 100, 100, 100, 101, 102, 103, 104, 500, 100000]
py> m = statistics.mean(data)
py> tol = 3*statistics.stdev(data)
py> [x for x in data if abs(x-m) > tol]
[100000]
py> m, tol, tol/m
(9201.181818181818, 90344.55455462009, 9.818798969508077)

tol/m is, of course, the error tolerance relative to m, which for the 
sake of the argument we are treating as the "known value": anything more 
than 9.818... times the mean is probably an outlier.

Now, the above uses an absolute tolerance, but I should be able to get 
the same results from a relative tolerance of 9.818... depending on 
which is more convenient to work with at the time.



-- 
Steve


More information about the Python-ideas mailing list