[Python-ideas] PEP 485: A Function for testing approximate equality

Fri Feb 6 06:00:24 CET 2015

Steven, I can't take it any longer. It is just absolutely ridiculous how
much discussion we've already seen about a function that's a single line of
code. I'll give you three choices. You can vote +1, 0 or -1. No more
discussion. If you still keep picking on details I'll just kill the PEP to
be done with the discussion.

On Thu, Feb 5, 2015 at 8:48 PM, Steven D'Aprano <steve at pearwood.info> wrote:

> On Thu, Feb 05, 2015 at 05:12:32PM -0800, Chris Barker wrote:
> > On Thu, Feb 5, 2015 at 4:44 PM, Steven D'Aprano <steve at pearwood.info>
> wrote:
> >
> > > > 0.0 < rel_tol < 1.0)
> > >
> > > I can just about see the point in restricting rel_tol to the closed
> > > interval 0...1, but not the open interval. Mathematically, setting the
> > > tolerance to 0 should just degrade gracefully to exact equality,
> >
> > sure -- no harm done there.
> >
> > > and a tolerance of 1 is nothing special at all.
> >
> > well, I ended up putting that in because it turns out with the "weak"
> test,
> > then anything compares as "close" to zero:
>
> Okay. Maybe that means that the "weak test" is not useful if one of the
> numbers is zero. Neither is a relative tolerance, or an ULP calculation.
>
>
> > tol>=1.0
> > a = anything
> > b = 0.0
> > min( abs(a), abs(b) ) = 0.0
> > abs(a-b) = a
>
> That is incorrect. abs(a-b) for b == 0 is abs(a), not a.
>
> > tol * a >= a
> > abs(a-b) <= tol * a
>
> If and only if a >= 0. Makes perfect mathematical sense, even if it's
> not useful. That's an argument for doing what Bruce Dawson says, and
> comparing against the maximum of a and b, not the minimum.
>
>
> > Granted, that's actually the best argument yet for using the strong test
> --
> > which I am suggesting, though I haven't thought out what that will do in
> > the case of large tolerances.
>
> It should work exactly the same as for small tolerances, except larger
> *wink*
>
>
> > > Values larger than 1 aren't often useful, but there really is no reason
> > > to exclude tolerances larger than 1. "Give or take 300%" (ie.
> > > rel_tol=3.0) is a pretty big tolerance, but it is well-defined: a
> > > difference of 299% is "close enough", 301% is "too far".
> > >
> >
> > yes it is, but then the whole weak vs string vs asymmetric test becomes
> > important.
>
> Um, yes? I know Guido keeps saying that the difference is unimportant,
> but I think he is wrong: at the edges, the way you determine "close to"
> makes a difference whether a and b are considered close or not. If you
> care enough to specify a specific tolerance (say, 2.3e-4), as opposed to
> plucking a round number out of thin air, then you care about the edge
> cases. I'm not entirely sure what to do about it, but my sense is that
> we should do something.
>
>
> > From my math the "delta" between the weak and strong tests goes
> > with  tolerance**2 * max(a,b).  So if the tolerance is >=1, then it
> makes a
> > big difference which test you choose. IN fact:
> >
> > Is a within 300% of b makes sense, but "are a and b within 300% of
> > each-other" is poorly defined.
>
> No more so that "a and b within 1% of each other". It's just a
> short-hand. What I mean by "of each other" is the method recommended by
> Bruce Dawson, use the larger of a and b, what Boost(?) and you are
> calling the "strong test".
>
>
> [...]
> > > Negative error tolerances, on the other hand, do seem to be meaningless
> > > and should be prevented.
> >
> >
> > you could just take the abs(rel_tol), but really?  what's the point?
>
> No no, I agree with you that tolerances (relative or absolute) should
> prohibit negative values. Or complex ones for that matter.
>
>
> > > (E.g. "guess the number of grains of sand on this beach".) Any upper
> > > limit you put in is completely arbitrary,
> >
> >
> > somehow one doesn't feel arbitrary to me -- numbers aren't close if the
> > difference between them is larger than the largest of the numbers -- not
> > arbitrary, maybe unneccesary , but not arbirtrary
>
> Consider one way of detecting outliers in numeric data: any number more
> than X standard deviations from the mean in either direction may be an
> outlier.
>
> py> import statistics
> py> data = [1, 2, 100, 100, 100, 101, 102, 103, 104, 500, 100000]
> py> m = statistics.mean(data)
> py> tol = 3*statistics.stdev(data)
> py> [x for x in data if abs(x-m) > tol]
> [100000]
> py> m, tol, tol/m
> (9201.181818181818, 90344.55455462009, 9.818798969508077)
>
> tol/m is, of course, the error tolerance relative to m, which for the
> sake of the argument we are treating as the "known value": anything more
> than 9.818... times the mean is probably an outlier.
>
> Now, the above uses an absolute tolerance, but I should be able to get
> the same results from a relative tolerance of 9.818... depending on
> which is more convenient to work with at the time.
>
>
>
> --
> Steve
> _______________________________________________
> Python-ideas mailing list
> Python-ideas at python.org
> https://mail.python.org/mailman/listinfo/python-ideas
> Code of Conduct: http://python.org/psf/codeofconduct/
>

-- 
--Guido van Rossum (python.org/~guido)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-ideas/attachments/20150205/941408fe/attachment-0001.html>