
On Mon, Jan 26, 2015 at 07:43:20AM +0000, Paul Moore wrote:
On 26 January 2015 at 06:39, Steven D'Aprano <steve@pearwood.info> wrote:
I suggest that in the interest of not flooding everyone's inboxes, we take that off-list until we have either a concensus or at least agreement that we cannot reach concensus.
Does it need to go off-list?
And now you know why there are hundreds of messages in this thread ;-) No, it doesn't need to go off-list, but I'm suffering badly from email fatigue, not just because of this thread but it is one of the major causes, and I'm sure I'm not the only one.
I'm still unclear about the arguments over asymmetric vs symmetric (I suspect, as you alluded to earlier, that they reflect a more fundamental problem, which is that there are 2 different types of use case with different constraints) so I'd like to at least be aware of the content of any discussion...
Symmetry and asymmetry of "close to" is a side-effect of the way you calculate the fuzzy comparison. In real life, "close to" is always symmetric because distance is the same whether you measure from A to B or from B to A. The distance between two numbers is their difference, which is another way of saying the error between them: delta = abs(x - y) (delta being the traditional name for this quantity in mathematics), and obviously delta doesn't depend on the order of x and y. But if we express that difference as a fraction of some base value, i.e. as a relative error, the result depends on which base value you choose: delta/x != delta/y so suddenly we introduce an asymmetry which doesn't reflect any physical difference. The error between x and y is the same whichever way you measure, but that error might be 10% of x and 12.5% of y (say). What *fundamentally* matters is the actual error, delta. But to decide whether any specific value for delta is too much or not, you need to pick a maximum acceptable delta, and that depends on context: a maximum acceptable delta of 0.0001 is probably too big if your x and y are around a billionth, and way too small if they are around a billion. Hence we often prefer to work with relative tolerances ("give or take 1%") since that automatically scales with the size of x and y, but that introduces an asymmetry. Asymmetry is bad, because it is rather surprising and counter-intuitive that "x is close to y", but "y is not close to x". It may also be bad in a practical sense, because people will forget which order they need to give x and y and will give them in the wrong order. I started off with an approx_equal function in test_statistics that was symmetric, and I could never remember which way the arguments went. (We can mitigate against the practical failure with explicit argument names "actual" and "expected" instead of generic ones. But who wants to be using keyword arguments for this all the time?) Example: suppose the user supplies a relative tolerance of 0.01 ("plus or minus one percent"), with x=100.0 and y=99.0. Then delta = 1.0. Is that close? If we use x as the base: 1 <= 0.01*100 returns True, but if we use y as the base: 1 <= 0.01*99 returns False. Instead, Bruce Dawson recommends using the larger of x and y: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-number... Quote: To compare f1 and f2 calculate diff = fabs(f1-f2). If diff is smaller than n% of max(abs(f1),abs(f2)) then f1 and f2 can be considered equal. This is especially appropriate when you just want to know whether x and y differ from each other by an acceptibly small amount, without specifying which is the "true" value and which the "true value plus or minus some error". Other alternatives would be to take the smaller, or the average, of x and y. Time permitting, over the next day or so I'll draw up some diagrams to show how each of these tactics change what counts as close or not close. -- Steve