Sorry,
This slipped off list -- bringin it back.
On Mon, Jan 26, 2015 at 12:40 PM, Paul Moore p.f.moore@gmail.com wrote:
Any of the approaches on the table will do something reasonable in this case:
In [4]: is_close_to.is_close_to(sum([0.1]*10), 1) testing: 0.9999999999999999 1 Out[4]: True
Yes, but that's not my point. I was responding to Steven's comment that having 2 different types of tolerance isn't "arcana", by pointing out that I find even stuff as simple as multiplication vs cumulative addition confusing. And I should note that I was (many years ago!) a maths graduate and did some numerical maths courses, so this stuff isn't completely unknown to me.
Right it can be arcane -- which is why I want this function, and why we want it to do something "sane" most of the time, be default.
Note that the 1e-8 default I chose (which I am not committed to) is not
ENTIRELY arbitrary -- it's about half the digits carried by a python float (double) -- essentially saying the values are close to about half of the precision available. And we are constrained here, the options are between 0.1 (which would be crazy, if you ask me!) and 1e-14 -- any larger an it would meaningless, and any smaller, and it would surpass the precision of a python float. PIcking a default near the middle of that range seems quite sane to me.
Sorry, that means nothing to me. Head exploding time again :-)
Darn -- I'll try again -- with a relative tolerence, two values are only going to be close if their exponent is within one of each-other. So what you are setting is how many digits of the mantisa you care about. a toleranc eof 0.1 would be about one digit, and a tolerance of 1e-15 would be 15 digits. Python floats carry about 15 digits -- so the relative tolerance has to be betwwen 1e-1 and 1e-15 -- nothign else is useful or makes sense. So I put it in the middle: 1e-8
This is quite different than setting a value for an absolute tolerance -- saying something is close to another number if the difference is less than 1e-8 would be wildly inappropriate when the smallest numbers a float can hold are on order of 1e-300!
On the other hand, I find this completely obvious. (Well, mostly - don't the gaps between the representable floats increase as the magnitude gets bigger, so an absolute tolerance of 1e-8 might be entirely reasonable when the numbers are sufficiently high?
sure it would -- that's the point -- what makes sense as an absolute tolerance depends entirely on the magnitude of the numbers -- since we don't know the magnitude of the numbers someone may use, we can't set a reasonable default.
arcana, maybe, not it's not a floating point issue -- X% of zero is zero
absolutely precisely.
But the "arcana" I was talking about is that a relative error of X% could be X% of the value under test, of the expected value, of their average, or something else.
Ahh! -- which is exactly the point I think some of us are making -- defining X% error relative to the "expected" value is the simplest and most straightforward to explain. That's the primary reason I prefer it.
And only one of those values is zero, so
whether X% is a useful value is entirely dependent on the definition.
not sure what you meant here, but actually relative error goes to heck if either value is zero, and with any of the definitions we are working with. So X% is useful for any value except if one of the values is zero.
And how relative errors are defined is floating point arcana (I can picture the text book page now, and it wasn't simple...)
semantics here -- defining a realtive error can be done with pure real numbers -- computing it can get complex with floating point.
But back to a point made earlier -- the idea here is to provide something
better than naive use of
x == y
I still wonder whether "naive use of equality" is much of a target, though. There are only two use cases that have been mentioned so far. Testing is not about equality, because we're replacing assertAlmostEqual. And when someone is doing an iterative algorithm, they are looking for convergence, i.e. within a given range of the answer. So neither use case would be using an equality test anyway.
well, the secondary target is a better (or more flexible) assertAlmostEqual. It is not suitable for arbitrarily large or small numbers, and particularly not numbers with a range of magnitudes -- a relative difference test is much needed.
I'm not sure I follow your point, but I will say that if Nathaniel has
seen a lot of use cases for assertAlmostEqual that can't be easily handled with the new function, then something is badly wrong.
Well, I"m not suggesting that we replace assertAlmostEqual -- but rather augment it. IN fact, assertAlmostEqual is actually a an absolute tolerance test (expressed in terms f decimal places). That is the right thing, and the only right thing to use when you want to compare to zero.
What I'm proposing a relative tolerance test, which is not the right thing to use for comparing to zero, but is the right thing to use when comparing numbers of varying magnitude.
There aren't enough good use cases that we can reasonably decide to reject any of them as out of scope,
I've lost track of what we might be rejecting.
The whole symmetric -- non symmetric argument really is bike shedding -- in the context of "better than ==" or "different but as good as assetAlmostEqual" -- any of them are just fine.
so really all wer aare left with is defaults -- also bike-shedding, except for the default for the zero test, and there are really two options there:
use 0.0 for abs_tolerance, and have it fail for any test against zero unless the user specifies something approporate for their use case.
or
use a SOME default, either for abs_tolerance or zero_tolerance, and make an assumption about the ofder of magnitide of the lielky results, so that it will "jsut work" for tests against zero. Maybe something small relative to one (like 1e-8) would be OK, but that concerns me -- I think you'd get false positives for small numbers which is worse that false negatives for all comparisons to zero.
1e-8 -- but you already know that ;-) -- anything between 1e-8 and 1e-12
would be fine with me.
TBH all I care about in this context is that there must be 2 values x and y for which is_close(x,y) == True and x != y.
everything on the table will do that.
I'm tempted to
strengthen that to "for all y there must be at least 1 x such that..." but it's possible that's too strong and it can't be achieved.
I think we could say that for all y except 0.0 -- and even zero if an abs_tolerance is greater than zero is set.
Basically, the default behaviour needs to encompass what I believe is most people's intuition - that "close" is a proper superset of "equal".
A good reason not to have all defaults be zero -- I don't think we need a function that doesn't work at all with default values.
-Chris
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov