PEP 485: A Function for testing approximate equality

Hi folks, After much discussion on this list, I have written up a PEP, and it is ready for review (see below) It is also here: https://www.python.org/dev/peps/pep-0485/ That version is not quite up to date just yet, so please refer to the one enclosed in this email for now. I am managing both the PEP and a sample implementation and tests in gitHub here: https://github.com/PythonCHB/close_pep Please go there if you want to try it out, add some tests, etc. Pull requests welcomed for code, tests, or PEP editing. A quick summary of the decisions I made, and what I think are the open discussion points: The focus is on relative tolerance, but with an optional absolute tolerance, primarily to be used near zero, but it also allows it to be used as a plain absolute difference check. It is using an asymmetric test -- that is, the tolerance is computed relative to one of the arguments. It is perhaps surprising and confusing that you may get a different result if you reverse the arguments, but in this discussion it became clear that there were some use-cases where it was helpful to know exactly what the tolerance is computed relative too, and that in most use cases, if just doesn't matter. I hope this is adequately explained in the PEP. We could add a flag to set a symmetric test (I'd go with what boost calls the "strong" test), but I'd rather not -- it just confuses things, and I expect users will tend to use defaults anyway. It is designed to work mostly with floats, but also supports Integer, Decimal, Fraction, and Complex. I'm not really thrilled with that, though, it turns out to be not quite as easy to duck-type it as I had hoped. To really do it right, there would have to be more switching on type in the code, which I think is ugly to write -- contributions, opinions welcome on this. I used 1e-8 as a default relative tolerance -- arbitrarily because that's about half of the decimal digits in a python float -- suggestions welcome. Other than that, of course, we can bike-shed the names of the function and the parameters. ;-) Fire away! -Chris PEP: 485 Title: A Function for testing approximate equality Version: $Revision$ Last-Modified: $Date$ Author: Christopher Barker <Chris.Barker@noaa.gov> Status: Draft Type: Standards Track Content-Type: text/x-rst Created: 20-Jan-2015 Python-Version: 3.5 Post-History: Abstract ======== This PEP proposes the addition of a function to the standard library that determines whether one value is approximately equal or "close" to another value. Rationale ========= Floating point values contain limited precision, which results in their being unable to exactly represent some values, and for error to accumulate with repeated computation. As a result, it is common advice to only use an equality comparison in very specific situations. Often a inequality comparison fits the bill, but there are times (often in testing) where the programmer wants to determine whether a computed value is "close" to an expected value, without requiring them to be exactly equal. This is common enough, particularly in testing, and not always obvious how to do it, so it would be useful addition to the standard library. Existing Implementations ------------------------ The standard library includes the ``unittest.TestCase.assertAlmostEqual`` method, but it: * Is buried in the unittest.TestCase class * Is an assertion, so you can't use it as a general test (easily) * Uses number of decimal digits or an absolute delta, which are particular use cases that don't provide a general relative error. The numpy package has the ``allclose()`` and ``isclose()`` functions. The statistics package tests include an implementation, used for its unit tests. One can also find discussion and sample implementations on Stack Overflow, and other help sites. These existing implementations indicate that this is a common need, and not trivial to write oneself, making it a candidate for the standard library. Proposed Implementation ======================= NOTE: this PEP is the result of an extended discussion on the python-ideas list [1]_. The new function will have the following signature:: is_close_to(actual, expected, tol=1e-8, abs_tol=0.0) ``actual``: is the value that has been computed, measured, etc. ``expected``: is the "known" value. ``tol``: is the relative tolerance -- it is the amount of error allowed, relative to the magnitude of the expected value. ``abs_tol``: is an minimum absolute tolerance level -- useful for comparisons near zero. Modulo error checking, etc, the function will return the result of:: abs(expected-actual) <= max(tol*expected, abs_tol) Handling of non-finite numbers ------------------------------ The IEEE 754 special values of NaN, inf, and -inf will be handled according to IEEE rules. Specifically, NaN is not considered close to any other value, including NaN. inf and -inf are only considered close to themselves. Non-float types --------------- The primary use-case is expected to be floating point numbers. However, users may want to compare other numeric types similarly. In theory, it should work for any type that supports ``abs()``, comparisons, and subtraction. The code will be written and tested to accommodate these types: * ``Decimal``: for Decimal, the tolerance must be set to a Decimal type. * ``int`` * ``Fraction`` * ``complex``: for complex, ``abs(z)`` will be used for scaling and comparison. Behavior near zero ------------------ Relative comparison is problematic if either value is zero. In this case, the difference is relative to zero, and thus will always be smaller than the prescribed tolerance. To handle this case, an optional parameter, ``abs_tol`` (default 0.0) can be used to set a minimum tolerance to be used in the case of very small relative tolerance. That is, the values will be considered close if:: abs(a-b) <= abs(tol*expected) or abs(a-b) <= abs_tol If the user sets the rel_tol parameter to 0.0, then only the absolute tolerance will effect the result, so this function provides an absolute tolerance check as well. A sample implementation is available (as of Jan 22, 2015) on gitHub: https://github.com/PythonCHB/close_pep/blob/master/is_close_to.py Relative Difference =================== There are essentially two ways to think about how close two numbers are to each-other: absolute difference: simply ``abs(a-b)``, and relative difference: ``abs(a-b)/scale_factor`` [2]_. The absolute difference is trivial enough that this proposal focuses on the relative difference. Usually, the scale factor is some function of the values under consideration, for instance: 1) The absolute value of one of the input values 2) The maximum absolute value of the two 3) The minimum absolute value of the two. 4) The arithmetic mean of the two Symmetry -------- A relative comparison can be either symmetric or non-symmetric. For a symmetric algorithm: ``is_close_to(a,b)`` is always equal to ``is_close_to(b,a)`` This is an appealing consistency -- it mirrors the symmetry of equality, and is less likely to confuse people. However, often the question at hand is: "Is this computed or measured value within some tolerance of a known value?" In this case, the user wants the relative tolerance to be specifically scaled against the known value. It is also easier for the user to reason about. This proposal uses this asymmetric test to allow this specific definition of relative tolerance. Example: For the question: "Is the value of a within x% of b?", Using b to scale the percent error clearly defines the result. However, as this approach is not symmetric, a may be within 10% of b, but b is not within x% of a. Consider the case:: a = 9.0 b = 10.0 The difference between a and b is 1.0. 10% of a is 0.9, so b is not within 10% of a. But 10% of b is 1.0, so a is within 10% of b. Casual users might reasonably expect that if a is close to b, then b would also be close to a. However, in the common cases, the tolerance is quite small and often poorly defined, i.e. 1e-8, defined to only one significant figure, so the result will be very similar regardless of the order of the values. And if the user does care about the precise result, s/he can take care to always pass in the two parameters in sorted order. This proposed implementation uses asymmetric criteria with the scaling value clearly identified. Expected Uses ============= The primary expected use case is various forms of testing -- "are the results computed near what I expect as a result?" This sort of test may or may not be part of a formal unit testing suite. The function might be used also to determine if a measured value is within an expected value. Inappropriate uses ------------------ One use case for floating point comparison is testing the accuracy of a numerical algorithm. However, in this case, the numerical analyst ideally would be doing careful error propagation analysis, and should understand exactly what to test for. It is also likely that ULP (Unit in the Last Place) comparison may be called for. While this function may prove useful in such situations, It is not intended to be used in that way. Other Approaches ================ ``unittest.TestCase.assertAlmostEqual`` --------------------------------------- ( https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmo... ) Tests that values are approximately (or not approximately) equal by computing the difference, rounding to the given number of decimal places (default 7), and comparing to zero. This method was not selected for this proposal, as the use of decimal digits is a specific, not generally useful or flexible test. numpy ``is_close()`` -------------------- http://docs.scipy.org/doc/numpy-dev/reference/generated/numpy.isclose.html The numpy package provides the vectorized functions is_close() and all_close, for similar use cases as this proposal: ``isclose(a, b, rtol=1e-05, atol=1e-08, equal_nan=False)`` Returns a boolean array where two arrays are element-wise equal within a tolerance. The tolerance values are positive, typically very small numbers. The relative difference (rtol * abs(b)) and the absolute difference atol are added together to compare against the absolute difference between a and b In this approach, the absolute and relative tolerance are added together, rather than the ``or`` method used in this proposal. This is computationally more simple, and if relative tolerance is larger than the absolute tolerance, then the addition will have no effect. But if the absolute and relative tolerances are of similar magnitude, then the allowed difference will be about twice as large as expected. Also, if the value passed in are small compared to the absolute tolerance, then the relative tolerance will be completely swamped, perhaps unexpectedly. This is why, in this proposal, the absolute tolerance defaults to zero -- the user will be required to choose a value appropriate for the values at hand. Boost floating-point comparison ------------------------------- The Boost project ( [3]_ ) provides a floating point comparison function. Is is a symetric approach, with both "weak" (larger of the two relative errors) and "strong" (smaller of the two relative errors) options. It was decided that a method that clearly defined which value was used to scale the relative error would be more appropriate for the standard library. References ========== .. [1] Python-ideas list discussion thread (https://mail.python.org/pipermail/python-ideas/2015-January/030947.html) .. [2] Wikipedaia page on relative difference (http://en.wikipedia.org/wiki/Relative_change_and_difference) .. [3] Boost project floating-point comparison algorithms ( http://www.boost.org/doc/libs/1_35_0/libs/test/doc/components/test_tools/flo... ) Copyright ========= This document has been placed in the public domain. -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 01/22/2015 04:40 PM, Chris Barker wrote:
After much discussion on this list, I have written up a PEP, and it is ready for review (see below)
Thanks! Very nice.
It is using an asymmetric test
Good - Ron convinced me that was the better way
However, as this approach is not symmetric, a may be within 10% of b, but b is not within x% of a. Consider the case::
Instead of x%, how about 10% ? ;) -- ~Ethan~

Overall I like it, but I'm not sure the help on the tol parameter is clear enough for people who don't already know what they want--in other words, the very people this function should be helping. In my experience, novices understand relative tolerance immediately if you put it in terms of "within X% of expected", but don't always understand it if you put it in terms of "within X * expected" or, worse, "relative to the magnitude of the expected value". Just using % in there somewhere makes people get the concept. Unfortunately, since the API doesn't actually use a percentage--and shouldn't--I'm not sure how to get this across in a one-liner in the help. You can always add something like "(e.g., a relative tolerance of .005 means that the actual value must be within 0.5% of the expected value)", but that's way too verbose. (Also, I should note that the people I've explained this to have mostly been people with a US 1960-1990-style basic math education; I can't be sure that people who learned in another country, or in the post-post-new-math era in the US, etc. will respond the same way, although I do have a bit of anecdotal evidence from helping a few people on forums like StackOverflow that seems to imply they do.) Sent from a random iPhone On Jan 22, 2015, at 16:40, Chris Barker <chris.barker@noaa.gov> wrote:
is the relative tolerance -- it is the amount of error allowed, relative to the magnitude of the expected value.

Andrew, I totally agree that it's not going to be that clear to folks -- but I'm as stumped as you as to how to make it clear without getting really wordy. Also, I think the percent error use case is infrequent, more likely would be that a relative tolerance of 1e-8 means that the numbers are the same to within about 8 significant decimal figures. After all, not many people think in terms of 0.0000001% Suggestions gladly accepted! -Chris On Thu, Jan 22, 2015 at 7:30 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I'd use an example with round numbers. "For example, to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-8." On Thursday, January 22, 2015, Chris Barker <chris.barker@noaa.gov> wrote:
-- --Guido van Rossum (on iPad)

On Jan 22, 2015, at 21:54, Guido van Rossum <guido@python.org> wrote:
I'd use an example with round numbers. "For example, to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-8."
Hard to beat that for simplicity. +1 on this wording or something similar instead of the current abstract version.

On Thu, Jan 22, 2015 at 04:40:14PM -0800, Chris Barker wrote:
I do not agree that it is ready for review. I think you have rushed to decide that this needs a PEP, rushed the preparation of the PEP, and now you have rushed the request for review. What's the hurry? As it stands with the decisions you have made, I cannot support this PEP even though I support the basic idea. -- Steve

On 01/23/2015 12:06 AM, Steven D'Aprano wrote:
On Thu, Jan 22, 2015 at 04:40:14PM -0800, Chris Barker wrote:
Why? If it has problems, how will he find out about them unless people read it and offer critiques? Or do you not refer to that process as reviewing?
I think you have rushed to decide that this needs a PEP,
He asked if a PEP was needed, and one is. Worst-case scenario we have something to point the next floating-point closeness requester to.
rushed the preparation of the PEP,
With over 100 messages to pull from, how was the preparation rushed? He should have taken a month to write it?
and now you have rushed the request for review.
Um, what? He should have just sat on it for a couple weeks before asking people to look it over? Asking for a review is not the same as asking for a pronouncement; it's not even on python-dev yet.
What's the hurry?
For one, Python 3.5 alpha one is just around the corner, and while there's still time after that the more eyeballs the better; for another, why wait? He has the information he needed, he collected it, made some decisions, and brought it back to the community. Ten days from the first floating point closeness message (14 if you count the float range class thread). A PEP also helps focus the conversation.
As it stands with the decisions you have made, I cannot support this PEP even though I support the basic idea.
Perhaps you feel rushed because you don't like it? -- ~Ethan~

On Fri, Jan 23, 2015 at 12:59:21AM -0800, Ethan Furman wrote:
Ethan, there are factors that you are unaware of because they took place off-list. Since they are private, I will say no more about them except to say that Chris has proceeded as if there is consensus when there actually is not. -- Steven

On Fri, Jan 23, 2015 at 1:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Steven, this appeal to things unmentionable is not an acceptable way to oppose a PEP. In the text you quoted I didn't see Chris claim consensus -- just that he has written up his version. It's ready for review because he wants feedback -- "ready for review" is *not* code for "this is the final word from the community, now the BDFL must speak." Your posts make me worried that we have turned into a political body rather than a group of technical enthusiasts trying to improve the language they all love. I don't think you can reasonably disagree that a PEP is needed -- not with so much discussion and apparently still no agreement. If you oppose the specific proposal, say what you think is wrong with it. If you think it needs more input from other experts, name those experts. If you think it needs more input from a community, name that community. I haven't actually read the PEP, so I don't have an opinion about it (my post last night was just an attempt to reword something quoted in the email thread). I just saw Antoine's response, and at least he talks about the proposal, not the politics around it. But he's awfully vague. We need a concrete counterproposal. Possibly a competing PEP. Anything but references to things that happened off-stage. If you have a personal beef with Chris, this is not the place. -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 23, 2015 at 1:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:
no need for mystery here -- I asked off-list for feedback from Steven and a couple others, then posted the PEP without having given them much time to respond. However, I posted the PEP because I wanted review, and we had had enough circular conversations that I thought is was time for a concrete proposal to bash on. I by no means intended to convey the impression that there was consensus reached among anyone in particular. The goal of posting the PEP was to determine if that was so, and if not, to change it to a point where that could happen. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 23, 2015 at 3:36 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Mmmm... That seemed kind of dogmatic... This thread has been going on for long. I prefer the PEP because it is a concrete proposal. Even if it is rejected, the reasons for the rejection will be documented, so people can be referred to the document instead of spinning this wheel again. Cheers, -- Juancarlo *Añez*

On 23 January 2015 at 00:40, Chris Barker <chris.barker@noaa.gov> wrote:
I'm not sure I follow the specifics but this is saying that everything will be close to zero. Isn't that the wrong way round? I thought the comments in the discussion on the list were saying that the problem with relative tolerance is that *nothing* is close to zero? Paul

On Thu, 22 Jan 2015 16:40:14 -0800 Chris Barker <chris.barker@noaa.gov> wrote:
I don't think the proposal fits the bill. For testing you want a function that is both 1) quite rigorous (i.e. checks equality within a defined number of ulps) 2) handles all special cases in a useful way (i.e. zeros, including distinguishing between positive and negative zeros, infinities, NaNs etc.). As someone who wrote such a function for Numba, what you're proposing would not be a suitable replacement. Regards Antoine.

On Fri, Jan 23, 2015 at 7:36 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It depends on what you are testing -- I tried to be explicite that this was not intended for testing the accuracy of numerical algorithms, for instance. Rather, it's best use case is testing to see whether you have introduced a big 'ol bug that completely changed your result -- have you got in the ballark. Something similar is in Boost, is in numpy, and n any number of other places. It is clearly useful. THat doesn't mean it has to go in the stdlib, but it is useful in many cases. As for the ulps test -- can you suggest a way to do that, while also providing a simple definition of tolerance that casual users can understand and use (and have a reasonable default? I know I can't. Note that some of the feedback on the PEP as is is that it's too hard to understand already! (without better docs, anyway)
zero, inf, -inf, NaN are all handles, I think correctly. And if -0.0 is not cloe to 0.0, I dont know what is ;-) (there is a test to make sure that's true actually) If you want to make the distinction between -0.0 and 0.0, then you don't want a "close" or "approximate" test.
As someone who wrote such a function for Numba, what you're proposing would not be a suitable replacement.
I never expected it would be a replacement for what is needed for a project like numba. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 23, 2015 at 8:51 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Maybe the confusion here is around the use of "test". To some, that means "unit test" or some other way of testing software. But I hope that's not the main use case. Let's look at Newton's algorithm for computing a square root. It's something like def sqrt(x): new_guess = 1 repeat: guess = new_guess new_guess = avg(guess, x/guess) # Not sure if I've got this right until guess is close enough to new guess return guess This seems a place where a decent "is close enough" definition would help. (Even though this particular algorithm usually converges so rapidly that you can get a result that's correct to within an ulp or so -- other approximations might not.)
Isn't an ulp just a base-2 way of specifying precision scaled so that 1 ulp is the low bit of the mantissa in IEEE fp?
-- --Guido van Rossum (python.org/~guido)

On Fri, Jan 23, 2015 at 5:41 PM, Guido van Rossum <guido@python.org> wrote:
Isn't an ulp just a base-2 way of specifying precision scaled so that 1 ulp is the low bit of the mantissa in IEEE fp?
Basically yes, but there are weird subtleties. E.g. 1 ulp remains the same absolute size between 1.0 and 2.0, so the same ulp threshold can vary by a factor of two in relative precision terms. And where you hit the boundary between exponents funny things happen: 2.0 +/- 1 ulp is [2.0 - 2.2e-16, 2.0 + 4.4e-16]. This can matter if you're looking for high precision -- if the value is supposed to be almost 2.0, then you don't want to get penalized for failing to get 2.0 + 2.2e-16, b/c there is no such number, but it might also be unacceptable to get 2 - 4.4e-16, which would be two values off. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Fri, 23 Jan 2015 08:51:00 -0800 Chris Barker <chris.barker@noaa.gov> wrote:
My approach was roughly: delta = 2 ** (ulps - 53 - 1) * (abs(first) + abs(second)) assertAlmostEqual(first, second, delta=delta) I don't know if it's right in the case of denormals etc. (there's also special code surrounding that to care for zeros, infinities, and NaNs) Regards Antoine.

On 01/23/2015 07:36 AM, Antoine Pitrou wrote:
I disagree -- this function is not meant for mathematicians, but for the non-maths person who needs something that works. Will there be situations where it doesn't work? Certainly. Will they be surprising? Possibly. On the other hand, I was very surprised the first time an bytes object gave me an integer and not a byte.
As someone who wrote such a function for Numba, what you're proposing would not be a suitable replacement.
This isn't for Numba, SciPy, or NumPy. It's to help those who don't use/need those products, but still have some light floating point work to do. -- ~Ethan~

On Fri, 23 Jan 2015 09:12:26 -0800 Ethan Furman <ethan@stoneleaf.us> wrote:
In which use case would a "non-maths person" (what exactly does that mean?) need "something that works"? I haven't seen any serious analysis of use cases. Guido talks about the Newton algorithm but I can't understand why a "non-maths person" would want to write one implementation of that - apart from recreation or educational purposes, that is. Regards Antoine.

On Fri, Jan 23, 2015 at 2:42 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'll give you a real life example. I never thought of myself as a "maths person," so I guess that makes me a "non-maths person." I am a software engineer. I leave the math to people with PhDs in mathematics, statistics, and engineering. In my day job at a trading firm, I work on automated trading systems. Most models do all their internal calculations using floating point math. At some point though, the desired order prices calculated by the model's indicators need to be converted to actual prices acceptable to the exchange. Floating point numbers being what they are, a computed value will almost never correspond to a valid order price. If your computed price is very close to, but not exactly on a tick boundary and you're not careful, you might erroneously price your order too aggressively or too passively. In these situations you need to recognize when the floating point value you have is within some small tolerance equal to a price on an exact tick boundary. Furthermore, these comparisons need to take into account the different tick sizes of different contracts, The CME's Yen/USD futures contract (6Y) has a tick size (minimum change between two valid prices) of $.000001 while their Euro/USD futures contract (6E) has a tick size of $.0001. In my world, this is done in Python, though the problem arises independent of the language used. It also has nothing to do with the relative sophistication of the math used internal to the model. It is more-or-less just a case of format conversion on output. Skip

On Fri, 23 Jan 2015 15:15:39 -0600 Skip Montanaro <skip.montanaro@gmail.com> wrote:
If you have such a precise requirement (the given tick size), you have to roll your own function, there's no point in a stdlib function, right? Regards Antoine.

On Fri, Jan 23, 2015 at 3:23 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
If you have such a precise requirement (the given tick size), you have to roll your own function, there's no point in a stdlib function, right?
No, I think Chris's is_close_to would probably do the trick, as the relative tolerance would be some fractional multiple of the tick size. In any case, whether or not I would choose to use this function is beside the point. (It's actually a real, though solved problem in my environment, so modifying code to use it wouldn't be worth the effort or potential sources of bugs at this point.) I was only pointing out that there are valid reasons where such a function might be useful to "non-math people," outside the realm of software testing. Knowing when you need something like this is often only discovered after mistakes are made though. Is a numerical analysis course still commonly taught in Computer Science departments? Skip

On 24 January 2015 at 03:12, Ethan Furman <ethan@stoneleaf.us> wrote:
Note that the key requirement here should be "provide a binary float comparison function that is significantly less wrong than the current 'a == b'". "a == b" is the competition here, not the more correct versions available in other libraries. As far as semantics go, I would expect the new function to be a near drop-in replacement for https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmo... in a testing context. The reason I view the proposal in the PEP as problematic is because it is approaching the problem *like a scientist*, rather than as someone who last studied math in high school. The unittest module definition relies on a very simple set of assumptions: 1. The user understands how arithmetic subtraction works 2. The user understands how decimal rounding works 3. The user understands how absolute deltas work This is a "good enough" answer that handles a wide variety of real world use cases, and is very easy to understand. Most importantly, it provides a hint that when working with floating point numbers, "==" is likely to cause you grief. This simple definition *isn't* really good enough for statistical or scientific use cases, but in those cases you should be using a statistical or scientific computation library with a more sophisticated definition of near equality. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23 January 2015 at 00:40, Chris Barker <chris.barker@noaa.gov> wrote:
This section is very weak. As someone who doesn't do numerically intensive computing I would start with the assumption that people who do would have the appropriate tools in packages like numpy, and they would have the knowledge and understanding to use them properly. So my expectation is that this function is intended specifically for non-specialists like me. Based on that, I can't imagine when I'd use this function. You mention testing, but unittest has a function to do this already. Sure, it's tied tightly to unittest, so it's not useful for something like py.test, but that's because unittest is the stdlib testing framework. If you wanted to make that check more widely available, why not simply make it into a full-fledged function rather than an assertion? And if it's not suitable for that purpose, why does this PEP not propose updating the unittest assertion to use the new function? It can't be right to have 2 *different* "nearly equal" functions in the stdlib. Outside of testing, there seems to be no obvious use for the new function. You mention measured values, but what does that mean? "Measure in the length of the line and type in the result, and I'll confirm if it matches the value calculated"? That seems a bit silly. I'd like to see a couple of substantial, properly explained examples that aren't testing and aren't specialist. My worry is that what this function will *actually* be used for is to allow naive users to gloss over their lack of understanding of floating point: n = 0.0 while not is_close_to(n, 1.0): # Because I don't understand floating point do_something_with(n) n += 0.1 BTW, when writing that I had to keep scrolling up to see which order actual and expected went in. I'd imagine plenty of naive users will assume "it's symmetrical so it shouldn't matter" and get the order wrong. In summary - it looks too much like an attractive nuisance to me, and I don't see enough value in it to counteract that. Paul

On Fri, Jan 23, 2015 at 8:05 AM, Paul Moore <p.f.moore@gmail.com> wrote:
I'll see what I can do to strengthen it.
Indeed that is the idea (though there are plenty of specialists using numpy as well ;-) ) Based on that, I can't imagine when I'd use this function. You mention
That would be an option, but I don't think the one in unittest is the right test anyway -- its focus on on number of decimal digits after the decimal place is not generally useful. (that would make some sense for the Decimal type...) And if
it's not suitable for that purpose, why does this PEP not propose updating the unittest assertion to use the new function?
well, for backward compatibility reasons, I had just assumed it was off the table -- or a long, painful road anyway. And the unitest is very vested in it's OO structure -- would we want add free-form functions to it?
It can't be right to have 2 *different* "nearly equal" functions in the stdlib.
Well, they do have a different functionality -- maybe some people really do want the decimal digits thing. I'm not sure we'd want one function with a whole bunch of different ways to call it -- maybe we would, but having different functions seems fine to me.
This came up in examples in the discussion thread -- I'd don't think I would use it that way myself, so I'm going to leave it to others to suggest better examples or wording. Otherwise, I'll probably take it out. I'd like to see a couple of substantial, properly explained examples
that aren't testing and aren't specialist.
In practice, I think testing is the biggest use case, but not necessarily formal unit testing. That's certainly how I would use it (and the use case that prompted me to start this whole thread to begin with..). I'll look in my code to see if I use it other ways, and I'm open to any other examples anyone might have. But maybe it should be with testing code in that case -- but I don't see any free-form testing utility functions in there now. Maybe it should go in unitest.util ? I'd rather not, but it's just a different import line.
Is that necessarily worse? it would at least terminate ;-) floating point is a bit of an attractive nuisance anyway.
Well, I think the biggest real issue about this (other than should it be in the stdlib at all) is the question of a symmetrical vs. symmetrical test. I decided to go (for this draft, anyway) with the asymmetric test, as it is better defined and easier to reason about, and more appropriate for some cases. And the biggest argument for a symmetric test is that it is what people would expect. So I tried to make the parameter names that would make it clear (rather than a,b or x,y) -- I think I failed on that, however -- anyone have a better suggestion for names? It turns out "actual" is far too similar in meaning to "expected". In summary - it looks too much like an attractive nuisance to me, If it's not there, the folks will cobble somethign up themselves (and I'm sure do, all the time). If they know what they are doing, and take care, then great, but if not then they may get something with worse behavior that this. Maybe they will at least understand it better, but I suspect the pitfalls will all still be there in a typical case. And in any case, have to take the time to write it. That's my logic anyway. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 23, 2015 at 9:21 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Indeed that is the idea (though there are plenty of specialists using numpy as well ;-) )
uhm, non-specialists, that is. In fact, the one in numpy is more susceptible to miss-use. On the other hand, it's there, and it's useful, and works most of the time. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Jan 24, 2015 at 4:21 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Updating the assertion to use the new function would be a matter of tweaking the implementation of unittest's assertAlmostEqual() to now call this function and assert that it returns True. The OO structure of unittest wouldn't be affected; just the exact definition of one particular assertion. I'd say that's a point worth mentioning in the PEP. Conceptually, this is going to do the same thing; yes, it's a change of definition, but obviously this won't be done in a point release anyway. It would make reasonable sense to sync them up. Alternatively, if you choose not to have that as part of the proposal, it would be worth adding a word or two of docs to unittest stating that assertAlmostEqual is not the same as is_close_to (and/or add "assertCloseTo" which would use it), as the existing implementation is all about absolute difference. ChrisA

On Fri, Jan 23, 2015 at 5:45 PM, Chris Angelico <rosuav@gmail.com> wrote:
Yeah, having just taken a quick look at the source, I'd go so far as to say assertAlmostEqual is almost totally broken. I had to read the docs three times to work out that while it sorta sounds like it provides relative tolerances, it actually doesn't at all -- places=3 means something like abs_tol=10**-3. Not really appropriate for numerical work. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Fri, Jan 23, 2015 at 9:45 AM, Chris Angelico <rosuav@gmail.com> wrote:
sure -- that's not quite what I meant. I was really addressing the "where would this sit" question. unittest does not currently have any stand-alone utility functions for testing in it. If we put this there, would anyone think to look for it there?
I'd say that's a point worth mentioning in the PEP.
well, whether to change a TestCase assertion or add a new one is a brand new question -- we could add that to this PEP if people think that's a good idea. For my part, I find unittest painful, and use py.test (and sometimes nose) anyway....
probably a good idea, yes. I really don't think we want to change assertAlmostEqual -- certainly not anytime soon. It seems like gratuitous backward incompatibility. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 23 January 2015 at 17:21, Chris Barker <chris.barker@noaa.gov> wrote:
Than understanding what you're doing? Yes. But it's sort of my point that fp is prone to people misunderstanding, and it's a shame to give people more opportunities.
Your parameter names and documentation are fine - it's very obvious how to use the function when you look. It's just that you *need* to look because an asymmetric check isn't immediately intuitive. I say "immediately" because when I think about it yes, the question "is a close enough to b?" is actually asymmetric.
Yeah, you have a point. And TBH, I can ignore this function just as easily as I currently ignore cmath.sin, so it's no big deal. Guido's example of Newton iteration is a good use case (although most of the time I'd expect to use a prebuilt function from a module, rather than build it myself with Newton iteration, but maybe that just reflects the fact that I don't do numerical programming). Paul

On Fri, Jan 23, 2015 at 9:59 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Well duh. Any algorithm that isn't already in the math module would require way too much code. The point of the example is that most people have probably seen that algorithm before, and it's only one simple step, really, so they won't be distracted by trying to understand the algorithm when the point of the example is to show how you would use is_close_to(). (And it's one of the simplest algorithms that gives an *approximation*, not an exact answer, at least not in the mathematical sense, which is also important in this case -- if the algorithm was exact there would be no need to use is_close_to().) -- --Guido van Rossum (python.org/~guido)

On 23 January 2015 at 18:10, Guido van Rossum <guido@python.org> wrote:
Sorry. What I was trying to say is that if I had a need for say a Bessel function, or numerical integration, or a zero of a function, I'd go hunting for a package that implemented it (something like mpmath, maybe) rather than rolling my own numerical algorithm using is_close_to(). But I do agree, that implementing numerical algorithms is a good use of is_close_to. And your example was fine, it'd make a good addition to use cases in the PEP. (But I wonder - wouldn't it work better with a "symmetrical" close-to function? That's probably a question for Chris.) Paul

Well, you usually use Newton's algorithm to find the zero of a function, so in that case, you'd want an absolute comparison. But it's pretty common to do a simple iterative solution where you check convergence by seeing if the new solution is close to the previous solution, in which case, a symmetric test would probability be better, but the asymmetric one would be fine -- you'd be asking the question -- is the new solution close to the previous one? -Chris

Guido van Rossum writes:
the point of the [Newton's method] example is to show how you would use is_close_to().
Except that this clearly is a Cauchy test, the algorithm doesn't know the limit. In principle, the appropriate computation would be symmetric. I don't think this is a problem in practice[1], but Skip's "straddling the tick" example is much stronger for an asymmetric comparison function. On the other hand, Skip's case requires an absolute comparison, not a relative one. The whole discussion has been really fast and loose about use cases. People with strong preferences can't seem to wrap their heads around others' use cases, examples poorly matched to the proposals are common, the expertise of the numerical experts seems irrelevant because we *don't* want accuracy even in corner cases, we just want to make it easier for naive users to avoid writing "x == y". ISTM that this PEP can be reduced to We need a floating comparison function that's good enough for government work, to help naive users avoid writing "x == y" for floating point comparisons. There are use cases where one of the values is a known accurate value, so the comparison function is asymmetric. This generally won't get things "too wrong" for symmetric comparisons, except where a relative comparison involves true values near zero. Unfortunately, not much can be done in that case because it requires enough domain knowledge to realize that true values near zero occur and that this is a problem, so use of this function is covered by "consenting adults".[2] And oh yeah, IANAEINA.[3] But for this PEP, I don't need to be. <wink/> Footnotes: [1] I've reconsidered. A priori, I still like symmetric errors better in general, but the target audience for this function isn't going to be reasoning about equivalence classes of IEEE 754 floats. [2] As is all use of floating point. [3] I am not an expert in numerical analysis. Which IIUC applies to the PEP author as well as to this poster.

On Fri, Jan 23, 2015 at 12:40 AM, Chris Barker <chris.barker@noaa.gov> wrote:
I might phrase this a bit more strongly -- assertAlmostEqual is confusing and broken-by-default for common cases like comparing two small values, or comparing two large values.
So for reference, it looks like the differences from numpy are: 1) kwarg names: "tol" and "abs_tol" versus "atol", "rtol". Numpy's names seem fine to me, but if you want the longer ones then probably "rel_tol", "abs_tol" would be better? 2) use of max() instead of + to combine the relative and absolute tolerance. I understand that you find the + conceptually offensive, but I'm not really sure why -- max() is maybe a bit better, but it seems like much of a muchness to me in practice. (Sure, like you say further down, the total error using + might end up being higher by a factor of two or so -- but either people are specifying the tolerances they want, in which case they can say what they mean either way, or else they're just accepting the defaults, in which case they don't care.) It might be worth switching to + just for compatibility. 3) The default tolerances. Numpy is inconsistent with itself on this point though (allclose vs. assert_allclose), so I wouldn't worry about it too much :-). However, a lot of the benefit of numpy.allclose is that it will do something mostly-reasonable out-of-the-box even if the users haven't thought things through at all. 99% of the benefit of having something like this available is that it makes it easy to write tests, and 99% of the benefit of a test is that it exists and makes sure that your values are not wildly incorrect. So that's nice. BUT if you want that kind of out-of-the-box utility then you need to have some kind of sensible default for comparisons to zero. (I just did a quick look at uses of python code uses of assertAlmostEqual on github, and in my unscientific survey of reading the first page of results, 30.4% of the calls were comparisons against zero. IMO asking all these people to specify tolerances by hand on every call is not very nice.) One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0. [And a nice possible side-effect of this is that numpy could conceivably then add such an argument as well "for compatibility with the stdlib", and possibly use this as a lever to fix it's weird allclose/assert_allclose discrepancy. The main blocker to making them consistent is that there is lots of code in the wild that assumes allclose handles comparisons-to-zeros right, and also lots of code that assumes that assert_allclose is strict with very-small non-zero numbers, and with only rtol and atol you can't get both of these behaviours simultaneously.]
I'd strongly consider expanding the scope of this PEP a bit so that it's proposing both a relative/absolute-error-based function *and* a ULP-difference function. There was a plausible-looking one using struct posted in the other thread, it would cover a wider variety of cases, and having both functions next to each other in the docs would provide a good opportunity to explain why the differences and which might be preferred in which situation. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sat, Jan 24, 2015 at 8:30 AM, Nathaniel Smith <njs@pobox.com> wrote:
Longer names preferable. It was quite a long way into the original thread before I understood what "atol" meant - my brain kept wanting it to be related to the atoi family of functions from C (ASCII to Integer (atoi), ASCII to Long (atol), etc, converting strings to integers). ChrisA

Longer names preferable.
I had a suggestion on github for the same thing -- how about: rel_tolerance and abs_tolerance ?
Not all of us are as contaminated by C ;-) in fact, when I see the C functions I first think of tolerances... Long clear names are good. -Chris

On Fri, Jan 23, 2015 at 4:30 PM, Nathaniel Smith <njs@pobox.com> wrote:
Many style guides recommend against using _ to separate abbreviated words in variable names, so either relative_/absolute_tolerance or reltol/abstol. OTOH, I don't see any problem with numpy's atol/rtol.

On 01/23/2015 01:30 PM, Nathaniel Smith wrote:
On Fri, Jan 23, 2015 at 12:40 AM, Chris Barker wrote:
Longer names are good for us non-maths folks. ;) rel_tol and abs_tol look good to me.
That makes no sense to me. I'm not sure taking the max does either, though, as phrases like "you can be off by 5% or 30 units, whichever is [smaller | greater]" comes to mind.
One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0.
Seems reasonable.
Also seems reasonable. So, in the interest of keeping things insane ;) how about this signature? def close_to(noi, target, min_tol, max_tol, rel_tol, zero_tol): """ returns True if noi is within tolerance of target noi: Number Of Interest - result of calulations target: the number we are trying to get min_tol: used with rel_tol to determine actual tolerance max_tol: used with rel_tol to determine actual tolerance zero_tol: an absolute tolerance if target == 0 (otherwise rel_tol is used an as zero_tol) """ -- ~Ethan~

On Friday, January 23, 2015 1:31 PM, Nathaniel Smith <njs@pobox.com> wrote:
If you're thinking about the post I think you are (mine), I wouldn't suggest using that. The main issue is that was a bits-difference function, not an ulps-difference function--in other words, bits_difference(x, y) is the number of times you have to do y = nexttoward(y, x) to get y == x. In the case where the difference is <= 2, or where x and y are finite numbers with the same sign and exponent, they happen to be the same, but otherwise, they don't. For example, consider x as the 5th largest number with one exponent, and y as the 5th smallest number with the next. They're 10 bits away, but 12.5 ulp(x) away and 7.5 ulp(y) away. Most algorithms that you want to test for ulp difference are specified to be within 0.5, 1, or 2 ulp, and for C lib functions it's always 0.5 or 1 (except pow in certain cases), or to no more than double the ulp difference--but definitely not _all_, so it would be misleading to offer a bits-difference function as an ulps-difference function. Secondarily, even as a bit-difference function, what I posted isn't complete (but I think the version in https://github.com/abarnert/floatextras is), and makes various decisions and assumptions that aren't necessarily the only option. Also, there's nothing else in the stdlib that directly accesses the bits of a float in Python, which seems a little weird. Finally, neither Python nor the C89 standard that CPython implies require that float actually be an IEEE 754-1985 double (much less an IEEE 754-2005 binary64, the later standard I actually have a copy of...). In particular, sys.float_info doesn't assume it. I think if we wanted this, we'd want to implement nexttoward in C (by calling the C99/POSIX2001 function if present, and maybe our own bit-twiddling-IEEE-doubles-in-C implementation for Windows, but it's not there otherwise), then define ulp (in C or Python) in terms of nexttoward, then define ulp_difference(x, y) (ditto) in terms of ulp(y). This does require a bit of care to make sure that, e.g., ulp_difference(float_info.max, inf) comes out as 1 or as an error, whichever one you want, and so on. (That means it also requires deciding what to do for each edge case, since they're not standardized by IEEE 754-1985, IEEE 754-2008, C99, or POSIX2001.) This would work correctly and consistently on almost every *nix platform (even some that don't use IEEE double) and on Windows, and wouldn't exist on platforms where it won't work correctly. Of course other implementations would have to come up with some other compatible implementation, but at least Java has an ulp function, and if .NET doesn't, it can probably make assumptions about the underlying platform. If we also want a bits_difference function in the stdlib (and I'm not sure we do), I'd suggest also writing that in C, by pointer-casting from double to int64_t and using the information in C99 math.h/limits.h (and again maybe special-casing Windows), rather than twiddling IEEE bits in Python.

The next response makes it clear why I think that's out of scope for this proposal -- it is considerably harder for casual users to wrap their brains around, so I think if such a thing exists, it should probably be a different function. Not that two functions can't be in the same PEP. But in any case, I'm not qualified to write it ( certainly not the code, but not really the PEP either) If someone else wants to champion that part, I'm happy to work together however makes sense. -Chris

2) use of max() instead of + to combine the relative and absolute tolerance.
In fact, the code uses "or", but in amounts to the same thing -- if the difference is within either the relative or absolute tolerance, it's "close".
Actually I agree with you here -- I think I've said elsewhere that I expect in practice people will set their tolerance to an order of magnitude, so even a factor of two doesn't much matter. But I see no advantage to doing it that way (except perhaps as a vectorized computation, which this is not)
It might be worth switching to + just for compatibility.
Well, the other difference is that numpy's version sets a default non-zero absolute tolerance. I think this is fatally wrong. Way too easy to get something really wrong for small values. Once we've done something incompatible, why not make it cleaner? And I see little reason for compatability for it's own sake.
I spent some time thinking about this, and my first version did have a default abs_tol to cover the near-zero case. But it would be absolutely the wrong thing for comparing small values. If you can think of defaults and an algorithm that would work well for large and small values and also comparison to zero, I'd be all for it.
Hmm -- my thinking is that at least those tests would immediately not work, but agreed, nicer for defaults to work for common cases.
One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0.
Here is where I'm not sure: is there only an issue with comparing to exactly zero? Or can vet small numbers under flow and cause the same problem?
I'm not sure it's much of an incentive for ghe Stalin, but sure, that would be nice.
I responded to this elsewhere. Thanks for your input. -Chris

On Jan 23, 2015, at 17:41, Chris Barker - NOAA Federal <chris.barker@noaa.gov> wrote:
Of course they can underflow. But I don't think that's a practical problem except in very rare cases. It means you're explicitly asking for better than +/- 2**min_exp, so it shouldn't be surprising that nothing but an exact match qualifies. Take a concrete example: with a tol of 1e-5, it's only going to underflow if expected is around 1e-320 or below. But since the next smaller and larger numbers (9.95e-321 and 1.0005e-320) aren't within 1e-5, the test gives the right answer despite underflowing. I'd have to think about it a bit to make sure there's no pathological case that doesn't work out that way--but really, if you're checking subnormal numbers for closeness with a general-purpose function, or checking for relative closeness pushing the bounds of 1 ulp without thinking about what that means, I suspect you're already doing something wrong at a much higher level. So, just special-casing 0 should be sufficient. Maybe the answer there is to have an is_close_to_0 function, instead of a parameter that's only useful if expected is 0? But then you might have, say, a comprehension where some of the expected values are 0 and some aren't, so maybe not...

On Sat, Jan 24, 2015 at 1:35 PM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
Maybe the answer there is to have an is_close_to_0 function, instead of a parameter that's only useful if expected is 0? But then you might have, say, a comprehension where some of the expected values are 0 and some aren't, so maybe not...
That's a more philosophical question about API design. It's probably worth mentioning the two options in the PEP - separate function for "close to zero" with these args, or put it all into the one function with those args. ChrisA

On Jan 23, 2015, at 18:41, Chris Angelico <rosuav@gmail.com> wrote:
Sure. And obviously the PEP has to pick one of the options and make the case for it. I just wanted to make it clear that (I'm pretty sure) it really is only 0 that's special here, not "any subnormal" or "small numbers in some vague sense" or anything like that. The problem is that nothing is within even a ridiculously huge relative tolerance of 0 except 0; plenty of things (or at least as many values as you have reason to expect) are within a reasonable tolerance of a nonzero subnormal. (Also, this is just about zero_tol, not about the wider abs-and-rel issue, which I have nothing to add to.)

On Sat, Jan 24, 2015 at 2:32 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
Right. Pick one, and have a section of "Alternate proposals" or "Rejected sub-proposals" or something, in which the other is explained. If nothing else, they make entertaining reading :) ChrisA

On Fri, Jan 23, 2015 at 6:41 PM, Chris Angelico <rosuav@gmail.com> wrote:
I could mention it -- though I started all this thinking that we should keep relative and absolute tolerance separate, then realized that relative was going to be useless for zero, so added the absolute tolerance to cover that (which I originally named zero_tol, but realized that it really was absolute everywhere...). Now that we're thinking that we can have a switch for the exactly zero case, then there may be no need to have an absoute tolerance parameter, but only relative and zero. Then you'd have a separate function (if you wanted) for absolute tolerance, so it could have a default -- that may be better than requiring the user to set the parameter to get an absolute tolerance test at all. I guess the key question is if someone would want both an relative tolerance and an absolute tolerance, aside from the zero issue. -Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Friday, January 23, 2015 8:22 PM, Chris Barker <chris.barker@noaa.gov> wrote:
I guess the key question is if someone would want both an relative tolerance and an absolute tolerance, aside from the zero issue.
Which already raises whether they'd want to min, max, average, or sum the two. And frankly I have no idea. That's exactly the question I didn't want to even try to answer, because smarter people than me have already given arguments about this and I don't have anything extra to contribute. But I guess you do have to answer it, since you're writing the PEP. :)

On 01/23/2015 11:02 PM, Andrew Barnert wrote:
I guess the key question is if someone would want both an relative tolerance and an absolute tolerance, aside from the zero issue.
Which already raises whether they'd want to min, max, average, or sum the two. And frankly I have no idea.
Today I experimented with implementing is_close by using a parabola equation. y = a(x-h)**2 + k Note: The close area is outside the curve of the parabola. The distance between the point u and v, correspond to the y value, and the x value corresponds to the relative distance from the vertex. def is_parabola_close(u, v, rtol, atol=0): if u == v: return True if u * v < 0: return False x = (u + v) * .5 y = (1.0/x*rtol) * x**2 + atol return abs(u - v) <= y This line: y = (1.0/x*rtol) * x**2 + atol Reduces to: y = rtol * x + atol Which looks familiar. LOL It turns out the relative distance from the vertex means the x distance corresponds to the focus, and the y distance matches the width, for all values of x and y. I thought this was interesting even though it didn't give the result I visualised. I'm going to add a "size" keyword to the function to make the vertex of the parabola independent from the distance of the two points. ;-) I'm not sure it helps the PEP much though. Cheers, Ron

On Fri, Jan 23, 2015 at 6:35 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
cool, this may be fine then. And makes a lot of sense. Maybe the answer there is to have an is_close_to_0 function, instead of a
exactly -- maybe it's because I'm so used to numpy, but expect that folks would want to call this in a comprehension or something where you've got a wide range of numbers, but want to use the same function and parameters, and don't want it to blow up at zero. I'll try adding this tomorrow, and see how it works out. with some tests, etc. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

One option would be to add a zero_tol argument, which is an absolute
tolerance that is only applied if expected == 0.
OK -- now I know what the problem is here -- I thought I"d explored it already. If you have a tolerance that you use only when expected is zero (or when either is...) then you have teh odd reslut that a samll number will be "close" to zero, but NOT close to a smaller number. I implemented this on a branch in github: https://github.com/PythonCHB/close_pep/tree/zero_tol And you get the odd result: In [9]: is_close_to(1e-9, 0.0) Out[9]: True fine -- the default zero_tol is 1e-8 In [10]: is_close_to(1e-9, 1e-12) Out[10]: False but huh??? 1e-9 is close to zero, but not close to 1e-12???? This is why I dropped the idea before. I'm back to the key point -- relative error compared to zero is not defined -- you need to set an absolute tolerance if you want to test against zero -- and there is no default that makes sense for all (or even most) cases. I'd much rather require people to have to think about what makes sense for their use case than get trapped by a default that's totally inappropriate. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Jan 24, 2015 at 11:59 AM, Chris Barker <chris.barker@noaa.gov> wrote:
I'd much rather require people to have to think about what makes sense for their use case than get trapped by a default that's totally inappropriate.
And by the time you have thought through your use case you're probably better off just writing abs(x-y) <= eps for some eps that you decide from your use case. The number of messages written to debate this one simple formula make me think it's not so simple after all. So perhaps this is, once again, something that's better off as a recipe? Alternatively, maybe a useful approach to resolving this debate could be to look for places in actual code where people have solved this for their own use case (as in, written by someone else to solve a real problem, not made up as an example). Then if you see a particular pattern occur repeatedly, you might be able to use that evidence to suggest the right helper function that does the whole thing in one call, and by looking at variations you might get a good insight in the needed parameters and defaults. -- --Guido van Rossum (python.org/~guido)

On Sat, Jan 24, 2015 at 12:21 PM, Guido van Rossum <guido@python.org> wrote:
Exactly why I never thought a function to simply do an absolute closeness test wasn't worth it. This is for the case where you want a relative test, but also want something sane for a comparison to zero. If there aren't any zeros in your data, you don't need to think about it. If there are, it will fail, and then you will have to think about it. The number of messages written to debate this one simple formula make me
think it's not so simple after all.
well, we shouldn't underestimate the capacity for bike-shedding. Next I plan to go through full set of messages since I posted the PEP, and tease out the real disagreements/decision points -- I guess I'll see how that pans out.
So perhaps this is, once again, something that's better off as a recipe?
I hope not, but if consensus can't be reached, perhaps so.
I've got a pile of uses of numpy's all_close (mostly tests) and it would be easy to search for those in other places. Not sure how to find other uses in other people's code though. And, of course, Steven put something similar in the statistics tests. So maybe the idea that this is primarily a new testing utility is the way to go.
I have no idea how to find those, but if anyone has suggestions for how to look, or examples in their own code, that would be great. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Jan 24, 2015 at 7:59 PM, Chris Barker <chris.barker@noaa.gov> wrote:
Yes that's.... the idea? :-) If someone says that their expected value is exactly zero, then using relative tolerance just makes no sense at all. If they wanted an exact test they'd have written ==. And this is reasonable, because even if you know that the exact answer is zero, then you can't expect to get that with floating point -- +/-1e-16 or so is often the best you can hope for. But if someone says their expected value is 1e-12, then... well, it's possible that they'd be happy to instead get 0. But likely not. 0 is extremely far from 1e-12 in relative terms, and can easily cause qualitatively different behaviour downstream (e.g. log10(1e-12) == -12, log10(0) == error). The example that came up in the numpy discussion of these defaults is that statsmodels has lots of tests to make sure that their computations of tail value probabilities are correct. These are often tiny (e.g., P(being 6 standard deviations off from the mean of a normal) = 9.9e-10), but emphatically different from zero. So it's definitely safer all around to stick to relative error by default for non-zero expected values. Admittedly I am leaning pretty heavily on the "testing" use case here, but that's because AFAICT that's the overwhelming use case for this kind of functionality. Guido's optimization example is fine, but using a function like this isn't really the most obvious way to do optimization termination (as evidenced by the fact that AFAICT none of scipy's optimizers actually use use a rel_tol+abs_tol comparison on two values -- maybe they should?). And I don't understand Skip's example at all. (I get that you need to quantize the prices and you want to minimize error in doing this, but I don't understand why it matters whether you're within 1% of a breakpoint versus 40% of a breakpoint -- either way you're going to have to round.)
I'd much rather require people to have to think about what makes sense for their use case than get trapped by a default that's totally inappropriate.
But this seems a strange reason to advocate for a default that's totally inappropriate. is_close_to(x, 0.0) simply doesn't mean anything sensible in the current PEP -- even giving an error would be better. -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sun, Jan 25, 2015 at 7:32 AM, Nathaniel Smith <njs@pobox.com> wrote:
sure -- that's why I (and numpy and Steven's statistics test function) put in an absolute tolerance as well. If you know you testing near near, then you set an abs_tolernace that define what "near zero" or "small" mean in this case. But if someone says their expected value is 1e-12, then... well, it's
possible that they'd be happy to instead get 0. But likely not. 0 is extremely far from 1e-12 in relative terms,
And 1e-12 from zero also, of course. Which is the trick here. Even with an asymmetric test, 0.0 is not relatively close to anything, and nothing is relatively close to zero (as long as the relative tolerance is less than 1 -- which it really should be. So I think we should use the zero_tolerance option if either input is zero, but then we get these continuities. So It seems, if a user wants to use the same parameters to test a bunch of numbers, and some of them may be zero, that they should define what small is to them by setting an abs_tolerance. Though I guess I'd rather a zero_tol that defaulted to non-zero that an abs_tol that did. So we might be able to satisfy your observation that a lot of use cases call for testing against zero.
But would you even need to test for zero then in that case? And if so, wouldn't setting abs_tol to what you wanted for "very small" be the right thing to do? I note that Steven's testing code the the stdlib statistics library used a rel_tolerance and abs_tolerance approach as well. I haven't seen any example of special casing zero anywhere.
I agree that it is as well -- sure you could use it for a simple recursive solution to an implicit equation, but how may people whip those up, compared to either testing code or writing a custom comparison designed specifically for the case at hand.
Sure it does -- it means nothing is relatively close to zero -- haven't we all agreed that that's the mathematically correct result? And if you write a test against zero it will reliably fail first time if you haven't set an abs_tolerance. So you will then be forces to decide what "near zero" means to you, and set an appropriate abs_tolerance. I think this points to having a separate function for absolute tolerance compared to zero -- but that's just abs(val) > zero_tolerance, so why bother? Or do you think there are common use cases where you would want purely relative tolerance, down to very close to zero, but want a larger tolerance for zero itself, all in the same comprehension? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Hi All, I've gone through all the messages in this thread since I posted the draft PEP. I have updated the code and PEP (on gitHub) with changes that were no brainers or seemed to have clear consensus. The PEP also needs some better motivational text -- I"ll try to get that soon. So I think we're left with only a few key questions: 1) Do we put anything in the stdlib at all? (the big one), which is closely tied to: 2) Do we explicitly cal this a testing utility, which would get reflected in the PEP, and mean that we'd probably want to add a unitest.TestCase assert that uses it. 3) Do we use an asymmetric or symmetric test? Most people seemed to be fine with the asymmetric test, but Steven just proposed the symmetric again. I'll comment on that later. 4) What do we do about tolerance relative to zero? Do we define a specific zero_tolerance parameter? or expect people to set abs_tolerance when they need to test against zero? And what should the default be? Here is my take on those issues: 1) Yes, we put something in. It's quite clear that there is no one solution that works best for every case (without a lot of parameters to set, anyway), but I'm quite sure that what I've proposed, modified with any solutions to the issues above, would be useful in the majority of cases. Let's keep in mind Nick's comment: "Note that the key requirement here should be "provide a binary float comparison function that is significantly less wrong than the current 'a == b'" 2) I do agree that the most frequent use case would be for testing, but that doesn't always mean format unit testing (it could be quick check on the command line, or iPython notebook, or....) and it certainly doesn't mean the unittest package. So I think it's a fine idea to add an assertion to TestCase that uses this, I'd much rather see it as a stand alone function (maybe in the math module). A static method of TestCase would be a compromise -- it's just some extra typing on in import line, but I think it would get lost to folks not using unittest. I note that Guido wrote: "To some, that means "unit test" or some other way of testing software. But I hope that's not the main use case." While Nick wrote: "I would personally find the PEP more persuasive if it was framed in terms of providing an improved definition of assertAlmostEqual that better handles the limitations of binary floating point dynamic ranges" So I'm not sure what to make of that. 3) I prefer the asymmetric test -- I've already given my reasons. But I'm pretty convinced that particularly for use in testing that it really doesn't matter: - relative tolerance tend to be small -- on order of 1e-8 or so. The 10% example I used in the PEP was to keep the math easy -- but it's not a common use case (for tests anyway) - folks tend to specify relative tolerance to an order of magnitude: ie.e 1e-8, not 1.323463e-8 -- if the magnitude of the tolerance is much smaller that its precision, then any of the definitions under consideration are effectively the same. So any of these are worth putting in the stdlib. 4) In my thinking and research, I decided that the (essentially optional) abs_tolerance parameter is the way to handle zero. But if Nathaniel or anyone else has use-cases in mind where that wouldn't work, we could add the zero_tol parameter to handle it instead. But not sure what the default should be -- if we think there is something special enough about order of magnitude 1, the something like 1e-12 would be good, but I'm not so sure. But it would be better to set such a default for zero_tolerance than for abs_tolerance. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 26, 2015 at 11:52 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Worth posting the link again I think: draft PEP is here: https://github.com/PythonCHB/close_pep/blob/master/pep-0485.txt Though I'm not seeing many changes, so maybe I'm looking at the wrong version. ChrisA

On Mon, Jan 26, 2015 at 12:02 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Yes, that's the idea -- defaulting rel_tol and zero_tol to non-zero values, and abs_tol to zero, gives you a set of defaults that will just work for people who want to write useful tests without having to constantly be distracted by floating point arcana. This does require that zero_tol is only applied for expected == 0.0, *not* for actual == 0.0, though. If you expected 1e-10 and got 0.0 then this *might* be okay in your particular situation but it really requires the user to think things through; a generic tool should definitely flag this by default.
Right, this example came up when it was discovered that np.allclose() has a non-zero abs_tol by default, and that np.testing.assert_allclose() has a zero abs_tol by default. It's a terrible and accidental API design, but it turns out that people really are intentionally use one or the other depending on whether they expect to be dealing with exact zeros or to be dealing with small-but-non-zero values. The whole motivation for zero_tol is to allow a single set of defaults that satisfies both groups.
Tests against zero won't necessarily fail -- sometimes rounding errors do cancel out, and you do get 0.0 instead of 1e-16 or whatever. At least for some sets of inputs, or until the code gets perturbed/refactored slightly, etc. That's why I said it might actually be better to unconditionally fail when expected==0.0 rather than knowingly perform a misleading operation. My claim wasn't that is_close_to(x, 0.0) provides a mathematically ill-defined result. I agree that that's a reasonable definition of "relatively close to" (though one could make an argument that zero is not relatively close to itself -- after all, abs(actual - expected)/expected is ill-defined). Instead, my point was that if the user is asking "is this close to 0?" instead of "is this exactly equal to zero?" then they probably are expecting that there exist some inputs for which those two questions give different answers. Saying "well TECHNICALLY this is a valid definition of 'close to'" is certainly true but somewhat unkind.
Or it could just be the same function :-). Who wants to keep track of two functions that conceptually do the same thing?
inf = float("inf") for (x, expected) in [ (inf, inf), (100, 1e100), (1, 10), (0, 1), (-1, 0.1), (-100, 1e-100), (-inf, 0), ]: assert is_close_to(10 ** x, expected) Though really what I'm arguing is that all in the same userbase people want relative tolerance down close to zero but a larger tolerance for zero itself. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Mon, Jan 26, 2015 at 01:17:02AM +0000, Nathaniel Smith wrote:
I really don't think that setting one or two error tolerances is "floating point arcana". I don't think that having to explicitly decide on what counts as "close" (as either an absolute difference or a relative difference) is especially onerous: surely anyone writing code will be able to cope with one or two decisions: - close enough means they differ by no more than X - close enough means they differ by no more than X%, expressed as a fraction This isn't ULPs :-) I'm almost inclined to not set any defaults, except perhaps zero for both (in which case "close to" cleanly degrades down to "exactly equal" except slower) and force the user to explicitly choose a value. Arguments in favour of setting some defaults: - People who want a "zero-thought" solution will get one, even if it does the wrong thing for their specific application, but at least they didn't have to think about it. - The defaults might occasionally be appropriate. Arguments against: - There is no default error tolerance we can pick, whether relative or absolute, which will suit everyone all the time. Unless the defaults are appropriate (say) 50% of the time or more, they will just be an attractive nuisance (see zero-thought above). In the statistics tests, I had the opportunity to set my own global defaults, but I don't think I ever actually used them. Maybe I could have picked better defaults? I don't know. I did use defaults per test suite, so that's an argument in favour of having is_close (approx_equal) not use defaults, but assertIsClose (assertApproxEqual) use per-instance defaults. [Context: in the tests, I had an assertApproxEqual method that relied on approx_equal function. The function had defaults, but I never used them. The method defaulted to reading defaults from self.rel and self.tol, and I did use them.]
I really think that having three tolerances, once of which is nearly always ignored, is poor API design. The user usually knows when they are comparing against an expected value of zero and can set an absolute error tolerance. How about this? - Absolute tolerance defaults to zero (which is equivalent to exact equality). - Relative tolerance defaults to something (possibly zero) to be determined after sufficient bike-shedding. - An argument for setting both values to zero by default is that it will make it easy to choose one of "absolute or relative". You just supply a value for the one that you want, and let the other take the default of zero. - At the moment, I'm punting on the behaviour when both abs and rel tolerances are provided. That can be bike-shedded later. Setting both defaults to zero means that the zero-thought version: if is_close(x, y): ... will silently degrade to x == y, which is no worse than what people do now (except slower). We can raise a warning in that case. The only tricky situation might be if you *may* be comparing against zero, but don't know so in advance. There are some solutions to that: - The suggested "zero_tol" parameter, which I dislike. I think it is an ugly and confusing API. - Some versions of is_close may not require any special treatment for zero, depending on how it treats the situation where both abs and rel tolerances are given. Further discussion needed. - Worst case, people write this: if (expected == 0 and is_close(actual, expected, tol=1e-8) or is_close(actual, expected, rel=1e-5)): but I don't think that will come up in practice very often. In the test_statistics module, I had tests that looked like this: for x in [bunch of non-zero values]: y = do_some_calculation(x) self.assertApproxEqual(x, y, rel=0.01) y = do_some_calculation(0) self.assertApproxEqual(0, y, tol=0.000001) which isn't hard to do, so I don't think this is a real issue in practice. I think the case of "my expected value might be zero, but I'm not sure in advance" is rare and unusual enough that we don't need to worry about it. [...]
Don't write it that way. Write it this way: abs(actual - expected) <= relative_tolerance*expected Now if expected is zero, the condition is true if and only if actual==expected. It would be bizarre for is_close(a, a) to return False (or worse, raise an exception!) for any finite number. NANs, of course, are allowed to be bizarre. Zero is not :-)
I agree, but I think this is a symptom of essential complexity in the problem domain. Ultimately, "is close" is ill-defined, and *somebody* has to make the decision what that will be, and that decision won't satisfy everyone always. We can reduce the complexity in one place: * provide sensible default values that work for expected != 0 but only by increasing the complexity elsewhere: * when expected == 0 the intuition that is_close is different from exact equality fails We can get rid of that complexity, but only by adding it back somewhere else: * is_close(x, y, zero_tol=0.1) and is_close(x, y, zero_tol=0.00001) give the same result for all the x,y I tested! that is, zero_tol is nearly always ignored. Since people will often need to think about what they want "is close" to mean no matter what we do, I would prefer not to add the complexity of a third tolerance value. If that means that "zero thought" users end up inadvertently testing for exact equality without realising it, I think that's a price worth paying for a clean API. (As I said earlier, we can raise a warning in that case.)
Agreed. -- Steven

On 26 January 2015 at 05:54, Steven D'Aprano <steve@pearwood.info> wrote:
I really don't think that setting one or two error tolerances is "floating point arcana".
The hundreds of messages on this topic would tend to imply otherwise :-( And to many users (including me, apparently - I expected the first one to give False), the following is "floating point arcana":
This does seem relatively straightforward, though. Although in the second case you glossed over the question of X% of *what* which is the root of the "comparison to zero" question, and is precisely where the discussion explodes into complexity that I can't follow, so maybe that's precisely the bit of "floating point arcana" that the naive user doesn't catch on to. I'm not saying that you are being naive, rather that readers of the docs (and hence users of the function) will be, and will find it confusing for precisely this reason.
But the function (in the default case) then won't mean "close to" (at least not in any sense that people will expect). Maybe making it mandatory to specify one or the other parameter, and making them keyword-only parameters, would be sufficiently explicit. But see below).
I'm not sure what you're saying here - by "not setting defaults" do you mean making it mandatory for the user to supply a tolerance, as I suggested above?
Agreed.
Starting the bike-shedding now, -1 on zero. Having is_close default to something that most users won't think of as behaving like their naive expectation of "is close" (as opposed to "equals") would be confusing.
Just make it illegal to set both. What happens when you have both set is another one of the things that triggers discussions that make my head explode. Setting just one implies the other is zero, setting neither implies whatever default is agreed on.
- At the moment, I'm punting on the behaviour when both abs and rel tolerances are provided. That can be bike-shedded later.
Don't allow it, it's too confusing for the target audience.
It is worse, because it no longer says what it means.
The only tricky situation might be if you *may* be comparing against zero, but don't know so in advance.
This can probably be handled by sufficiently good documentation. Once it was clear to me that this was an asymmetric operation, and that you were comparing whether X is close to a known value Y, I stopped finding the requirement that you know what Y is to make sense of the function odd. Having said that, I don't think the name "is_close" makes the asymmetry clear enough. Maybe "is_close_to" would work better (there's still room for bikeshedding over which of the 2 arguments is implied as the "expected" value in that case)?
I would call out this edge case explicitly in the documentation. It's a straightforward consequence of the definition, but it *will* be surprising for many users. Personally, I'd not thought of the implication till it was pointed out here.
Definitely agreed.
Agreed. The essential complexity here may not seem that complex to specialists, but I can assure you that it is for at least this user :-) Overall, I think it would be better to simplify the proposed function in order to have it better suit the expectations of its intended audience, rather than trying to dump too much functionality in it on the grounds of making it "general". If there's one clear lesson from this thread, it's that floating point closeness can mean a lot of things to people - and overloading one function with all of those meanings doesn't seem like a good way of having a clean design. Paul

On Sun, Jan 25, 2015 at 5:17 PM, Nathaniel Smith <njs@pobox.com> wrote:
OK -- I get it now -- this is really about getting a default for a zero tolerance test that does not mess up the relative test -- that may be a way to go.
got it -- if they want hat, they can set the abs_tolerance to what they need.
Exactly why I don't think abs_tolerance should be anything other than 0.0
why didn't they just override the defaults? but whatever. The whole motivation for zero_tol is to
allow a single set of defaults that satisfies both groups.
OK -- I'm buying it. However, what is a sensible default for zero_tolerance? I agree it's less critical than for abs_tolerance, but what should it be? Can we safely figure that order of magnitude one is most common, and something in the 1e-8 to 1e-14 range makes sense? I suppose that wouldn't be surprising to most folks. Tests against zero won't necessarily fail -- sometimes rounding errors
I get it -- seems rare, certainly more rare than the other case, is_close_to passes for small numbers when it really shouldn't. And sure, you could get a pass the first time around, because, indeed you DID get exactly zero -- that should pass. But when you do refactor and introduce a slightly different answer, you'll get a failure then an can figure it out then. Are you actually proposing that the function should raise an Exception if expected == 0.0 and abs_tolerance is also 0.0? (and i guess zero_tolerance if there is one)
I meant a case that wasn't contrived ;-)
Absolutely -- and adding a zero_tolerance may be a way to get everyone useful defaults. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 24 January 2015 at 07:30, Nathaniel Smith <njs@pobox.com> wrote:
I would personally find the PEP more persuasive if it was framed in terms of providing an improved definition of assertAlmostEqual that better handles the limitations of binary floating point dynamic ranges. The fact that unittest.assertAlmostEqual is in the standard library implies that any improvement to it must also be in the standard library, and moving the definition of near equality that unittest uses out to the math module so it is reusable in other contexts makes sense to me, especially if it means being able to share the definition between unittest and the statistics module. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On Sat, Jan 24, 2015 at 05:27:57PM +1000, Nick Coghlan wrote:
Unfortunately, I don't think we can change assertAlmostEqual. If we change the implementation, tests which were passing may fail, and tests which were failing may pass. The approach I took for test_statistics was to subclass TestCase and add my own approx-equal assertion method: https://hg.python.org/cpython/file/fcab9c106f2f/Lib/test/test_statistics.py#... In my tests, I call assertApproxEqual(first, second), with optional values tol (absolute error tolerance) and rel (relative error). If tol and/or rel are not used, the assertion defaults to self.tol or self.rel, which if not set default to 0, which is equivalent to testing exact equality. The actual fuzzy comparison itself is handled by a function approx_equal(x, y, tol, rel). I propose: - somebody other than me should review NumericTestCase.assertApproxEqual and check that it does nothing unreasonable; - the assertApproxEqual method be moved into TestCase, making it available to any user of unittest; - the fuzzy comparison approx_equal can be turned into a static(?) method of TestCase (see below for the advantage of a method rather than a function); - since there are considerable disagreements about the right way to handle a fuzzy comparison when *both* an absolute and relative error are given, people who disagree with the default definition can simply subclass TestCase and redefine the approx_equal method. (Which is much simpler than having to write the whole assertApproxEqual method from scratch.) This gives a workable (although not obvious) solution to Guido's example use-case, which might become: from unittest import TestCase are_close = TestCase.approx_equal def sqrt(x): new_guess = 1 repeat: guess = new_guess new_guess = avg(guess, x/guess) until are_close(guess, new_guess) return guess Note that in this case, at least, we do want a symmetric version of "is_close", since neither guess nor new_guess is "correct", they are both approximations. -- Steven

I think I addressed most of these issues in the summary note I jsut posted, but a few specific comments: Unfortunately, I don't think we can change assertAlmostEqual. If we
change the implementation, tests which were passing may fail, and tests which were failing may pass.
Agreed -- would have thought that was off the table. And ti's really a different test than the proposal -- it is an absolute_tolerance test, but where the tolerance is specified in number of decimal digits after the decimal point (or an optional specific delta) -- not really that useful , but I guess if you have assertTrue(), then why not? But adding a relative tolerance to unittest makes a lot of sense -- would "assertCloseTo" sound entirely too much like assertAlmostEqual? I think it may be OK if the docs for each pointed to the other.
The actual fuzzy comparison itself is handled by a function approx_equal(x, y, tol, rel).
NOTE: the difference between this and the my current PEP version is that that absolute tolerance defaults to something other than zero (though it looks like it does default to zero for the assert method), and it is a symmetric test (what Boost calls the "strong" test) - somebody other than me should review NumericTestCase.assertApproxEqual
and check that it does nothing unreasonable;
Well it requires the tolerance values to be set on the instance, and they default to zero. So if we were to add this to unittest.TestCase, would you make those instance attributes of TestCase? If so, we want different defaults -- so people could call the assertion with defaults and get something useful. I suggest the one I put in my proposal, naturally ;-) I looked at the underlying function pretty closely. I don't see anything wrong with it. I did a few things differently: - no need to check for NaN explicitly, the comparisons take care of that anyway. - inspired by the boost approach - I used "and" and "or", rather than calling max() -- same result, slightly better performance. But in the end -- essentially the same as the PEP code, except where it's intended to be different.
what assertApproxEqual does is add the ability to test a while sequence of values -- much like numpy's all_close. Do any of the other TestCase assertions provide that? But you could also add an optional parameter to pass in an alternate comparison function, rather than have it be a method of TestCase. As I said, I think it's better to have it available, and discoverable, for use outside of unitest. Note that in this case, at least, we do want a symmetric version of
"is_close", since neither guess nor new_guess is "correct", they are both approximations.
true, but you are also asking the question -- is the new_guess much different that guess. Which points to a asymmetric test -- but either would work. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Jan 25, 2015, at 17:21, Chris Barker <chris.barker@noaa.gov> wrote:
If assertAlmostEqual is primarily intended for float comparisons, and it really is misleadingly bad for that use, I'd think the right answer is to prove a new, better method under a different name, then deprecate the old one. That means old tests continue to work, but new tests stop being misled into testing the wrong thing, and if you're a few people will even think to redo old tests (and, presumably, discover long-standing bugs--otherwise either their tests cases are no good in the first place, or it's really not misleadingly bad, right?), which seems to be the best outcome you can reasonably hope for. That would imply that assertCloseTo should include the same sequence behavior as assertAlmostEqual, not just assertTrue(close_to), so it really is a drop-in replacement for most users. I suppose making assertAlmostEqual take an optional almost-equality function that defaults to the existing behavior, then deprecating calling it with the default, would have the same effect, but it seems clumsier in the long run.

On Mon, Jan 26, 2015 at 1:39 AM, Andrew Barnert <abarnert@yahoo.com.dmarc.invalid> wrote:
If assertAlmostEqual is primarily intended for float comparisons, and it really is misleadingly bad for that use,
Since I was complaining about assertAlmostEqual, I should probably say explicitly what I don't like: It has two arguments for specifying precision: "places", which is measured in decimal digits, and "delta", which is an absolute tolerance like the ones we've discussed here. The fact that this pair of options exists, and the fact that relative precision is very very frequently talked about in terms of "significant digits", strongly suggests to me that these are the two settings for relative and absolute tolerances, and setting places=7 means that the numbers should match up to 7 significant digits. However, this is not what places=7. In fact, if 'places' is specified, then it means we set the absolute tolerance to 5 * 10**(-places - 1). So the problems are: - This is confusing/misleading/surprising. - There is no support for relative tolerances, which are almost always what you want -- using the same tolerances for comparing two numbers in the millions and for comparing two numbers in the millionths is going to lead to ugly results. - Because there is no support for relative tolerances, the defaults are wildly inappropriate unless the numbers being compared happen to be of normal size, like say between 10**-2 and 10**2. The docs don't warn about this. And in fact if you want to get a useful result when comparing two large numbers, you actually have to assert that they are the same to some negative number of decimal places. Technically this can be worked out from docstring (it references round(), and round() is documented to accept a negative ndigits argument), but it's very surprising and obscure. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sun, Jan 25, 2015 at 5:39 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
See Nathanials note -- but I think the key thing is that assertAlmostEqual is NOT really intended for float comparisons -- rather, it's intended for decimal numbers that happen to be stored in floats (does it pre-date the Decimal object?) In practical use, most of us are used to thinking in terms of decimal numbers, it's what we all learn in grade school. An in practice, Python, and every other language I've seen, takes decimal literals for floats, and displays floats as decimals. Also in practical use, using floats for decimal numbers works just fine, particularly if you round on output. So I that's the use-case for assertAlmostEqual -- it provides an easy way to check if numbers are almost equal to N decimal places. I guess that's a pretty common use case. So maybe folks would want to keep it around. (by the way -- does it work for Decimal objects -- it really should, it's more appropriate for them anyway!) On the other hand, those of us trained in science and engineering are used to working with scientific or engineering notation -- i.e. x.xxxEn (some number times ten raised to a power)-- a mantissa and an exponent. While we still decimal, it's a better match for floating point numbers. We are also trained to consider "significant figures" -- or the number of digits of accuracy of a value -- measured or computed. 123,000,000 has the same number of significant figures as 0.000123 But they are not of the same value at all. Anyway, a relative comparison provides something akin to a comparison to a certain number of significant figures -- really common and useful, and familiar to at least those trained in most disciplines of science or engineering. Key here is that I'm not suggesting a better or more correct assertAlmostEqual. I'm suggesting something different (arguably more useful). I'm honestly not sure how useful assertAlmostEqual is -- but apparently is was useful enough to put in the first place ,and to get used since then.
even if it isn't a replacemen -- that sequence behavior is nice -- and Steven has already written it as well. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sun, Jan 25, 2015 at 05:21:53PM -0800, Chris Barker wrote: [...]
CloseTo assumes an asymetric test, which isn't a given :-) I prefer ApproxEqual, although given that it is confusingly similar to AlmostEqual, IsClose would be my second preference.
No, I would modify it to do something like this: if tol is None: tol = getattr(self, "tol", 0.0) # or some other default and similar for rel. I recommend using short names for the two error tolerances, tol and rel, because if people are going to be writing a lot of tests, having to write: self.assertIsClose(x, y, absolute_tolerance=0.001) will get tiresome.
I was motivated by assertEqual and the various sequence/list methods. I wanted to compare two lists element-wise using an approximate comparison, e.g.: # func1 and func2 are alternate implementations of the same thing a = [func1(x) for x in values] b = [func2(x) for x in values] self.assertApproxEqual(a, b) It wouldn't be meaningful to compare the two lists for approximate equality as lists, but it does make sense to do an element-by-element comparison. In the event of failure, you want an error message that is more specific than just the two lists being unequal.
That's an alternative too. I guess it boils down to whether you prefer inheritance or the strategy design pattern :-) I do think there are two distinct use-cases that should be included in the PEP: (1) Unit testing, and a better alternative to assertAlmostEqual. (2) Approximate equality comparisons, as per Guido's example. Note that those two are slightly different: in the unit testing case, you usually have an known expected value (not necessarily mathematically exact, but at least known) while in Guido's example neither value is necessarily better than they other, you just want to stop when they are close enough. Like Nick, I think the first is the more important one. In the second case, anyone writing a numeric algorithm is probably copying an algorithm which already incorporates a fuzzy comparison, or they know enough to write their own. The benefits of a standard solution are convenience and correctness. Assuming unittest provides a well-tested is_close/approx_equal function, why not use it?
I can see we're going to have to argue about the "Close To" versus "Close" distinction :-) I suggest that in the interest of not flooding everyone's inboxes, we take that off-list until we have either a concensus or at least agreement that we cannot reach concensus. -- Steve

On 26 January 2015 at 06:39, Steven D'Aprano <steve@pearwood.info> wrote:
Does it need to go off-list? I'm still unclear about the arguments over asymmetric vs symmetric (I suspect, as you alluded to earlier, that they reflect a more fundamental problem, which is that there are 2 different types of use case with different constraints) so I'd like to at least be aware of the content of any discussion... Paul

On Mon, Jan 26, 2015 at 07:43:20AM +0000, Paul Moore wrote:
On 26 January 2015 at 06:39, Steven D'Aprano <steve@pearwood.info> wrote:
And now you know why there are hundreds of messages in this thread ;-) No, it doesn't need to go off-list, but I'm suffering badly from email fatigue, not just because of this thread but it is one of the major causes, and I'm sure I'm not the only one.
Symmetry and asymmetry of "close to" is a side-effect of the way you calculate the fuzzy comparison. In real life, "close to" is always symmetric because distance is the same whether you measure from A to B or from B to A. The distance between two numbers is their difference, which is another way of saying the error between them: delta = abs(x - y) (delta being the traditional name for this quantity in mathematics), and obviously delta doesn't depend on the order of x and y. But if we express that difference as a fraction of some base value, i.e. as a relative error, the result depends on which base value you choose: delta/x != delta/y so suddenly we introduce an asymmetry which doesn't reflect any physical difference. The error between x and y is the same whichever way you measure, but that error might be 10% of x and 12.5% of y (say). What *fundamentally* matters is the actual error, delta. But to decide whether any specific value for delta is too much or not, you need to pick a maximum acceptable delta, and that depends on context: a maximum acceptable delta of 0.0001 is probably too big if your x and y are around a billionth, and way too small if they are around a billion. Hence we often prefer to work with relative tolerances ("give or take 1%") since that automatically scales with the size of x and y, but that introduces an asymmetry. Asymmetry is bad, because it is rather surprising and counter-intuitive that "x is close to y", but "y is not close to x". It may also be bad in a practical sense, because people will forget which order they need to give x and y and will give them in the wrong order. I started off with an approx_equal function in test_statistics that was symmetric, and I could never remember which way the arguments went. (We can mitigate against the practical failure with explicit argument names "actual" and "expected" instead of generic ones. But who wants to be using keyword arguments for this all the time?) Example: suppose the user supplies a relative tolerance of 0.01 ("plus or minus one percent"), with x=100.0 and y=99.0. Then delta = 1.0. Is that close? If we use x as the base: 1 <= 0.01*100 returns True, but if we use y as the base: 1 <= 0.01*99 returns False. Instead, Bruce Dawson recommends using the larger of x and y: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-number... Quote: To compare f1 and f2 calculate diff = fabs(f1-f2). If diff is smaller than n% of max(abs(f1),abs(f2)) then f1 and f2 can be considered equal. This is especially appropriate when you just want to know whether x and y differ from each other by an acceptibly small amount, without specifying which is the "true" value and which the "true value plus or minus some error". Other alternatives would be to take the smaller, or the average, of x and y. Time permitting, over the next day or so I'll draw up some diagrams to show how each of these tactics change what counts as close or not close. -- Steve

On Mon, Jan 26, 2015 at 7:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:
But one reason to take it off-list is that these very long, if not endless, circular conversations gives the impression that it really matters which specif choices are made, and that we will never come to a consensus about it, so this should not go in the stdlib. However, I'm pretty sure that we're down to details, that while interesting, don't really matter -- whether we use a asymmetric or symmetric test, weak or string version, we'll get something that will work better than == or assertAlmostEqual, and it will do the right thing in the vast majority of cases. I could live with, and indeed be happy with, any of the solutions on the table. My take from this thread is that most people converged on the asymmetric option as the better choice, but Steven feels strongly that the symetric option is the way to go. I don't know if this is a stopper for anyone, though. Is there anyone that could only live with one of the options? (by live with, I mean think that we'd be better off with nothing in the standard lib that one of these options) Please speak. The other issue is whether to have a default that will return True for at least common uses of comparison to zero. - I think it's better to be safe than sorry, and not let folks accidentally think they have a value close to zero that isn't really. - Nathaniel thinks that it's better to provide a default that will give an answer for is close to zero that will at least work for common cases. I could live with what Nathaniel Proposes, and I believe he said he could live with what I propose -- so this is not a stopper. However, someone's going to need to come up with what that default value -- part of why I think it should be zero is that I have no idea if a small-compared-to-one default is reasonable. I think that's it. Perhaps folks could focus now on issues that they think are show stoppers. Or bike-shed the parameter names and stuff, if you really want to paint a bike shed ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Jan 27, 2015 at 3:25 PM, Chris Barker <chris.barker@noaa.gov> wrote:
I would suggest that, as PEP author, you guide the conversation a bit by asking specific questions that you want to answer in the PEP. Steven's not the only one who's been seeing an awful lot of emails lately; I've just been skimming them, myself. But you could reboot the conversation by starting up some very specific discussion on points that will actually affect the PEP, and then things will hopefully be fresh and interesting again :) ChrisA

On Mon, Jan 26, 2015 at 9:30 PM, Chris Angelico <rosuav@gmail.com> wrote:
I would suggest that, as PEP author, you guide the conversation a bit by asking specific questions that you want to answer in the PEP.
I thought I did that, actually, in that email -- I guess I wasn't very clear. Here they are -- and please express not just your preference, but a clear statement about what would be acceptable or not acceptable. A) Which test do we use: 1) The asymmetric test 2) The "strong" test (minimum relative tolerance) 3) The "weak" test (maximum relative tolerance) B) Do we provide a non-zero default for the absolute tolerance? If so what should the value be? Remember that this serves primarily to provide a check against zero. I think that's it for the technical decisions. I would also appreciate suggestions for paramater names --at least if we go with the asymetric test: "actual" and "expected" is a bit confusing -- I like "expected", but we need something better than "actual" for the don't-know-it's-right one. I also would really appreciate someone working out the details and contributing text for the inclusion of this in unittest.TestCase -- unittest is really not my thing. I've pushed some changes to gitHub (sorry, forgot to push yesterday), and once the dust settles I'll incorporate as many of the suggestions in the PEP text as I can. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 26, 2015 at 10:35 PM, Chris Barker <chris.barker@noaa.gov> wrote:
The problem with this question is that, while it's easy to come up with examples where it may matter (e.g. 95 is within 5% of 100, but 100 is not within %5 of 95), in practice the tolerance is more likely to be 1e-8, in which case it doesn't matter.
It feels like absolute tolerance is a completely different test. And it is a much simpler test for which w don't need a helper function -- it's just abs(x) < tolerance. When does a program need *both* absolute and relative tolerance in a single test?
I still think this is better off as a recipe than as a helper function. -- --Guido van Rossum (python.org/~guido)

On Tue, Jan 27, 2015 at 8:20 AM, Guido van Rossum <guido@python.org> wrote:
Exactly why I'm happy with any of them. I'm trying to suss out whether anyone else has a reason to reject one or the others. If no one does, then we can just pick one.
Because we want it to be able to do something sane when comparing to zero -- the abs_tolerance allows you to set a minimum tolerance that will do something reasonable near zero (we could use a zero_tolerance, as Nathaniel has suggested, instead, but that creates the incontinuity that , for example, 1e-12 is close to zero, but it is not close to 1e-100 -- I think that's a bad idea for a call with the same arguments). I spend a good while thinking about this and playing with it, and it became clear to me that this is the best way to go for a not-to-surprising result. And it's consistent with what numpy and Steven's statistics test code does. Still TBD is what the default should be, though.
I still think this is better off as a recipe than as a helper function.
Are you prepared to reject the PEP? I'd prefer to give it this one last shot at determining if there really is no way to get consensus on a good-enough solution. I suspect there is a lot of bike shedding here -- people have ideas about what is best, and want to understand and talk about it, but that doesn't mean that they wouldn't rather see something else that nothing. -- that's certainly the case for me (both the bike shedding and the desire to see something ;-) ) Evidence: The numpy version has its faults -- but it's widely used. assertAlmost Equal has even more faults (limitations, anyway) but it's also widely used. Boost has something in it, even though it's a one-liner. Clearly this is a useful functionality to have available. ChrisA is right -- I have not done a good job at steering the group toward a consensus. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Jan 27, 2015 at 9:07 AM, Chris Barker <chris.barker@noaa.gov> wrote:
By now can't you summarize the reasons that others have brought up?
I don't think you can have this always be sane. For someone who for whatever reason is manipulating quantities that are in the range of 1e-100, 1e-12 is about as large as infinity. I think my reasoning comes down to the same rule I often use to decide whether we need one function or two -- if in every use case you always know whether you need version A or version B, then it's better to have two functions rather than a single one with a flag to request A or B. And isn't it the case that whenever you are comparing to zero, you *know* that you are comparing to zero, and you *must* specify an absolute tolerance (otherwise it's not a use case at all)? IIUC, numpy doesn't design APIs this way. They like to have swiss army knives that can do lots of different things, with lots of flags to request variant behavior. (I learned this from the discussion about linspace().) But IMO that goes back to an earlier, murkier tradition -- I recall using Fortran plot functions in the '80s that had 17 parameters. The reason for that style was that there was no concept of modules or packages, and hence there was only one namespace, shared between all possible libraries. So a new library would claim only a single name in the namespace and hook all sorts of functionality onto that single name. We don't have that problem in Python and hence I prefer clarity in functionality -- different functions for different behaviors, basically. (Maybe this should become the 20th line of the Zen of Python. :-)
Yes, I am prepared to reject the PEP. In fact any PEP is rejected by default if no consensus is obtained.
IIUC Boost's approach is better than numpy. It has separate functions for is_close (relative) and is_small (absolute), because they are separate use cases. Its is_close uses the symmetric version (though there is a flag to choose between weak and strong, which sounds like overkill).
ChrisA is right -- I have not done a good job at steering the group toward a consensus.
It's not too late! -- --Guido van Rossum (python.org/~guido)

On Tue, Jan 27, 2015 at 10:37 AM, Guido van Rossum <guido@python.org> wrote:
By now can't you summarize the reasons that others have brought up?
I can try ;-) --probably not until this evening though.
Because we want it to be able to do something sane when comparing to zero --
<snip> I don't think you can have this always be sane. For someone who for
whatever reason is manipulating quantities that are in the range of 1e-100, 1e-12 is about as large as infinity.
Exactly why I favor having the abs_tolerance default to zero.
I really appreciate this API design approach, and in this case I started out with that idea. But I think this is likely to be used where you need to test a bunch of values with single function/set of parameters. In TestCase.assertIsCloseTo, as well as home grown loops and comprehensions. IIUC, numpy doesn't design APIs this way.
I'm not sure numpy's API is exactly designed at all ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Jan 27, 2015 at 1:18 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Tue, Jan 27, 2015 at 10:37 AM, Guido van Rossum <guido@python.org> wrote:
I think my reasoning comes down to the same rule I often use to decide
I assume you mean assertNotAlmostEqual <https://hg.python.org/cpython/file/94d8524086bd/Lib/unittest/case.py#l525>. This is actually the same misguided two-for-one design. You *must* specify exactly one of delta or places, and the code takes a different path. Also, it looks like both are actually absolute tolerance. So what argument are you making here? -- --Guido van Rossum (python.org/~guido)

On Tue, Jan 27, 2015 at 2:00 PM, Guido van Rossum <guido@python.org> wrote:
Sorry -- I meant a Hypothetical assertIsCLoseTo -- which would wrap the proposed is_close_to(), but apply it to a entire sequence, like assertNotAlmostEqual <https://hg.python.org/cpython/file/94d8524086bd/Lib/unittest/case.py#l525>. does.
Yes, both are an absolute tolerance -- which , I think makes it a slightly less misguided design -- you aren't selecting two different functionalities, just two ways to spell the tolerance. So what argument are you making here? The point was that if the user applies one set of parameters to a sequence of values that may have some zeros in it, -- we need it in one function. Nathaniel has found that to be a fairly common use-case for the numpy allclose() -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I'm still confused. We're talking about a function that compares two simple values, right? Where does the sequence of values come from? Numpy users do everything by the array, but they already have an isclose(). Or perhaps you're talking about assertApproxEqual in test_statistics.py? That has so many ways to specify the relative and absolute tolerance that I give up understanding it. The docstring doesn't give any clarity -- it simply describes the behaviors, it doesn't say when you should be using both. "[...] a naive implementation of relative error testing can run into trouble around zero" (followed by a single example) doesn't really help me. On Tue, Jan 27, 2015 at 2:14 PM, Chris Barker <chris.barker@noaa.gov> wrote:
-- --Guido van Rossum (python.org/~guido)

On Tue, Jan 27, 2015 at 2:35 PM, Guido van Rossum <guido@python.org> wrote:
I'm still confused. We're talking about a function that compares two simple values, right? Where does the sequence of values come from?
it comes from wherever, but the idea is that the function itself is called in a loop or comprehension, we could require something like: [ is_close_to(i) for i in seq if abs(i) > abs_tol else abs(i) <= abs_tol] but I prefer: [ is_close_to(i, abs_tolerance=abs_tol) for i in seq ] And it gets worse for the TestCase.assert** - those are designed to act on a sequence, so you would need to specify both your relative tolerance and absolute tolerance anyway, if your sequence might have zeros in it.
Something like that, yes, but with a cleaner API and better docs ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Jan 27, 2015 at 02:35:50PM -0800, Guido van Rossum wrote:
Those ways evolved from my actual use of the function. What I found in practice is that within each TestCase, most of the tests used the same values for error tolerences. At the very least, I was aiming for all the tests to use the same error tolerance, but in practice I never quite achieved that. From time to time I would have a particularly difficult calculation that just wouldn't meet the desired tolerance, and I had to accept a lower tolerance. Most individual test methods within a single TestCase used the same error tolerances, which I set in the setUp method as self.rel for relative error (or self.tol for absolute), and wrote: self.assertApproxEqual(x, y, rel=self.rel) over and over again, occasionally overriding that value: self.assertApproxEqual(x, y, rel=1e-6) Since most of the time, the method was taken the tolerances from self, I reasoned that I should just make the default "take the tolerances from self" and be done with it. So that's how the method evolved. Most of those tests were for functions that didn't end up in the std lib, so I'm not surprised that this was not so clear. If I remember correctly, at the time I was also reading a lot about OOP design principles (as well as stats text books) and I think the use of instance attributes for the default error tolerances was probably influenced by that. -- Steve

On 01/27/2015 02:35 PM, Guido van Rossum wrote:
I'm still confused. We're talking about a function that compares two simple values, right? Where does the sequence of values come from?
I think from uses involving list comps and such. But I could be wrong. -- ~Ethan~

On Thu, Jan 29, 2015 at 10:45 AM, Ethan Furman <ethan@stoneleaf.us> wrote:
Yes, that's it -- as well as the sequence-compatible asserts in unittest. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Jan 27, 2015 at 11:20 AM, Guido van Rossum <guido@python.org> wrote:
+1 Just as a point of reference, APL and its derivatives use tolerant comparison in the default equal (=) operator. The definition that they use for finite x and y is simply x = y <=> abs(x-y) <= tol * max(abs(x), abs(y)) The tolerance can be adjusted globally or in some languages (such as J [1]) in the expression using additional syntax. In J, the default tolerance is 2**-44, which is about 5.7e-14. APL restricts the range of tolerance values to 0 through 2**-32. I would be +0 on adding something like def tolerant_equal(x, y, tol=2**-44): return abs(x-y) <= tol * max(abs(x), abs(y)) (name subject to bikeshedding) to math, but -1 on anything more complicated. I would rather see if tolerant_equal(x, y) or abs(x-y) <= 1e-10: .. than if tolerant_equal(x, y, atol=1e-10): .. [1] http://www.jsoftware.com/help/dictionary/d000.htm [2] http://www.dyalog.com/uploads/documents/Papers/tolerant_comparison/tolerant_...

On Tue, Jan 27, 2015 at 12:57:03PM -0500, Alexander Belopolsky wrote:
That's essentially the same as Bruce Dawson recommends: https://randomascii.wordpress.com/2012/02/25/comparing-floating-point-number... and also the approx_equal function in test_statistics, except that also uses an absolute error if given. Guido has asked what the use case for giving both relative and absolute error. I'll quote Dawson from the link above: [quote] The most generic answer to this quandary is to use a mixture of absolute and relative epsilons. If the two numbers being compared are extremely close – whatever that means – then treat them as equal, regardless of their relative values. This technique is necessary any time you are expecting an answer of zero due to subtraction. The value of the absolute epsilon should be based on the magnitude of the numbers being subtracted – it should be something like maxInput * FLT_EPSILON. Unfortunately this means that it is dependent on the algorithm and the inputs. Charming. The ULPs based technique also breaks down near zero for the technical reasons discussed just below the definition of AlmostEqualUlps. Doing a floating-point absolute epsilon check first, and then treating all other different-signed numbers as being non-equal is the simpler and safer thing to do. Here is some possible code for doing this, both for relative epsilon and for ULPs based comparison, with an absolute epsilon ‘safety net’ to handle the near-zero case: [...] [end quote] -- Steven

On Jan 29, 2015, at 6:52 AM, Steven D'Aprano That's essentially the same as Bruce Dawson recommends:
If anyone's interested in playing with this, I updated my sample implementation with a flag that let's you select which method it will use. Four! options. Note that I not suggesting such a flag be there in an std lib function, but it makes it easy to experiment. https://github.com/PythonCHB/close_pep/blob/master/is_close_to.py Note that I found that every single one of my test cases passed with all four methods, except one designed specifically to demonstrate the asymmetry of the asymmetric test. Which was my point about it not mattering much in practice. I still need to add a bunch more to the PEP by way of explanation, then I hope to be able to gauge where people are at. -Chris

On Tue, Jan 27, 2015 at 4:20 PM, Guido van Rossum <guido@python.org> wrote:
When does a program need *both* absolute and relative tolerance in a single test?
It's reasonably common in numpy, where the equivalent function is vectorized to do multiple checks at once, e.g. here are some tests that mix zeros and non-zeros in a single call: https://github.com/mnick/scikit-tensor/blob/b067ed02ef4b1e3f1cb24004b2f20af5... https://github.com/ricklupton/whales/blob/6bc722c1e2dbadb494fa3d4e83e1cb6415... The equivalent would be writing assert all([math.is_close_to(a, e) for (a, e) in zip(actual, expected)]) where 'expected' has a mix of zeros and non-zeros.
I don't find it super compelling either way -- at the end of the day, if 'math' doesn't provide this then many people will use libraries that will or else write their own. I guess I don't have a good sense of what the audience for 'math' is these days -- I'm sure it has one, but aside from tiny one-off usages I'm not sure what it is. None of the production numerical code I see even bothers importing it. It's possible that it largely serves a kind of pedagogical role? high schoolers learning to program etc.? in which case there might be some more benefit in having a well-documented discussion of the issues here in the stdlib docs. The most compelling argument I see is that if we care about unittest, then it would be good to have a better alternative to assertAlmostEqual. (...I don't know anyone who uses the unittest API directly these days, though!) -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Tue, Jan 27, 2015 at 10:05 AM, Nathaniel Smith <njs@pobox.com> wrote:
exactly -- as do a buch of the the unittest assertXXXX methods or if folks write their own equivalent in a comprehension.
if 'math' doesn't provide this then many people will use libraries that will or else write their own.
sure -- that's what been done for ages... you could say that about anything new being proposed for the stdlib.
Well, I'm a heavy numpy user as well, but I still use the math module when I have something simple to calculate (actually, not so much simple as small -- if I'm not working with a lot of numbers) or if I don't want the numpy dependency. A lot of people do at least some math with python -- I have no idea how many. And the statistics package was recently added -- I would have thought that would be next to useless without numpy, but shows you what I know.
anyone learning python may need a bit of math -- there's some pretty basic stuff in there. The most compelling argument I see is that if we care about unittest,
I agree -- adding it to unitest would be a very good idea. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 27 January 2015 at 04:25, Chris Barker <chris.barker@noaa.gov> wrote:
Hmm. That's a reason for writing a PEP, and getting feedback. If people feel strongly enough about the colour of the bikeshed to vote against the PEP, then it matters, otherwise not. If you believe it's only details by now and people can live with most of the options, maybe it's time to declare the PEP as complete, and post to python-dev for final comments before asking for a pronouncement?
Personally, I'd agree with this.
Nope, I could manage with any of the options. (Disclaimer: I'm not sure I've ever needed a function like this in practice, so my opinion is of limited relevance. My concerns over details are more about whether I'd find the function intuitive enough that I'd think of using it instead of a quick abs(a-b)<1e-8 in toy examples).
As Nathaniel has come up with a lot of real-world cases where comparing to zero is the issue, I think the new function needs to handle that case properly. Deliberately choosing defaults that don't suit a significant use case seems like a mistake. And saying that users need to make an explicit choice each time seems to be to be counter to the whole idea of a function which is designed to be used by people who *don't* have the experience to make such choices... Paul

TL;DR -- I can live with, and would indeed be happy with, either a symmetric or asymmetric test -- because most of the use-cases it just doesn't matter. But we have to pick one, so if you're interested -- read on --- On Mon, Jan 26, 2015 at 7:08 PM, Steven D'Aprano <steve@pearwood.info> wrote:
Actually, I think this is exactly true: If you ask the question: "are these to values close to each-other?" then the symmetric test makes the most sense -- obviously if a is close to b then b is close to a. whereas: If you ask the question: Is this value within a defined relative difference of an expected value? (i.e 10% of b) then you want an asymmetric test -- you clearly are defining "relative" to the known value. However, and I've tried to say this multiple times -- and no one disagreed (or agreed) yet: I don't care much which we use, because IT DOES NOT MATTER in most cases which you choose. Why is that? The most common use cases for this kind of thing are a general check of "is my computed number in the right ballpark?" And I've never seen anyone define "ballpark" to a high degree of precision -- usually you are choosing between say 1e-8 and 1e-9 (more of less 8 or 9 significant figures) This is why the 10% example we keep throwing around is a bit deceiving -- it makes the asymmetry seem far more important than it is. Remember that we are talking about: abs(a-b) <= tol*abs(b) vs abs(a-b) <= tol*abs(a) tol*abs(something) defines the absolute difference that can be tolerated. The difference between the two methods is tol*abs(a-b). In the "is 9 withing 10% of 10" example, that's the difference between "tolerating" .9 or 1 as a difference -- seems pretty significant. But if you have a more realistic tolerance, like 1e-8, then you are talking about a difference in absolute tolerance of around 1e-8 -- tiny. So you'll still get: 9.9999999 is close to 10, but 10 is not close to 9.9999999, but if you tack on even an extra 1e8 on there, you get it close both ways: In [45]: is_close_to(10, 9.99999991) Out[45]: True In [46]: is_close_to(9.99999991, 10) Out[46]: True Same if you go down a bit: In [47]: is_close_to(9.9999998, 10) testing: 9.9999998 10 Out[47]: False In [48]: is_close_to(10, 9.9999998) testing: 10 9.9999998 Out[48]: False So there is this tiny range of values for which is it asymmetric. Yes, it is still asymmetric, but remember that the usual use case is someone choosing between a rel_tolerance of 1e-8 or 1e-9, not 1e-8 or 1.00000001e-8, so within the precision of the specified tolerance -- they are the same. OK -- but we need to choose one (or set a flag for selecting one -- but the point of this is to have something people can just use) So -- there are some use cases where people may want to be testing against a specific value -- is the measured resistance within 1% of the nominal value? (is anyone ever going to write resistor testing code in Python???). In this case, they really want the symmetric test, and there is no way to simulate it with an asymmetric test. Granted, I think the use-case is rare that it would matter, but what is very common is testing a computed value against an expected value -- so the asymmetric case makes more sense there, too, or is at least easier to explain. So what I haven't seen yet is an example use case where you really need the symmetric case -- i.e. it matters that is_close(a,b) is guaranteed to be the same as is_close(b,a). Does anyone have a use-case?? Note: I took a look at the tests for the Statistics module -- as far as I could tell, all but one were comparing a computed value to an expected one -- in fact, I even see: self.assertApproxEqual(actual, expected) Happens to use the same names I used for the parameters ;-) The one exception is testing against math.fsum, where it's testing if two different implementations get (almost) the same result -- that arguable want's a symmetric test, though I can't imagine it would make a real difference: rel_tol set to 1e-16 (by the way, a _very_ small tolerance for a python float! -- this may be where an ULPS check would make sense) And I don't see a tolerance ever specified with more than one significant figure. And I see a lot of 1e-8 (though not all, but any means), so maybe that's a good default. Asymmetry is bad, because it is rather surprising and counter-intuitive
My point is that it very rarely matters which order you give them in anyway. So I agree that asymmetry is esthetically "bad", but I'm still looking for a practical example where it matters -- again, for the fairly casual user. Instead, Bruce Dawson recommends using the larger of x and y:
Sure -- but he then jumps right to the whole ULPS thing -- having not explained why that particular definition is best -- in fact, I'd probably go with: n% of min(abs(f1),abs(f2)) -- it's a bit more rigorous -- this is the Boost "strong" test. But again, these are really subtle differences in results, and if you know your allowed error that accurately, you probably should be doing the ULPS thing anyway. In fact, the only use cases I can imagine, or anyone has brought up, for using a tolerance as high as 1% or 10% is for the case when you are testing against a known value, and the asymmetric case makes more sense.
Time permitting, over the next day or so I'll draw up some diagrams to show how each of these tactics change what counts as close or not close.
I'm not sure we need a whole lot more explanation (maybe some folks do). But I think we do need one of either: Uses cases for when it's important to have a symmetric test. and/or Pronouncements (from anyone) that s/he "can't live with" one or the other Again -- "can't live with" means you think it's better to have nothing in the std lib. I took the time to write the PEP, and I'd like to see this through -- but we need to pick something -- and any of the three options on the table are fine with me. Three options: - The asymmetric test in the PEP - The Boost "strong" test (max rel error) - The Boost "weak" test (min rel error) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 27 January 2015 at 06:24, Chris Barker <chris.barker@noaa.gov> wrote:
All of that makes sense to me (I'm leaving out the detail of "it does not matter" as I'm sort of trusting you guys with the expertise to tell me that ;-)) But what *does* matter to me is usability and how the behaviour matches people's intuition. Not because the mathematical results will differ, but because it makes it easy for people to *think* about what they are doing, and whether it's OK. I would say that the correct approach is to make the default case as easy to use as possible. For that, a symmetrical are_close(a,b) is a no-brainer IMO. (Of course it has to work when either of a and b is zero). It works either way - if one value is a known "target", or if both values are approximations (e.g. when looking at convergence). Once we have that as a basis, look at how people might want to tweak it: are_close(a, b, within_abs=1e-8) # Within a specific distance of each other (absolute tolerance) are_close(a, b, within_rel=0.1) # Within 10% of each other In the relative case, I'd like "the experts" to decide for me what precisely "within 10% of each other" means (document the details, obviously, but don't bother me with them unless I go looking for them). In either case, I'd be happy to assume that if you change the defaults, you understand the implications (they can be explained in the documentation) such as relative tolerances being unstable near zero. I don't think it's a problem that the default behaviour can't be expressed in terms of explicit settings for the tolerance arguments (it's a wart, and could be annoying, but it's not a showstopper for me - allow setting both explicitly to None to mean "default" if it matters that much). That's it. Anyone wanting to specify both parameters together, or wanting the defaults to still apply "as well as" an explicitly specified tolerance, is deemed an "expert" and should be looking for a more specialised function (or writing their own). Paul

On 27 January 2015 at 19:49, Paul Moore <p.f.moore@gmail.com> wrote:
Translate that into explicit English and I'm not sure a symmetric definition reads more clearly: "a and b are close to each other" "a is close to b" "b is close to a" Given that the "is close to" formulation also simplifies the calculation of a relative tolerance (it's always relative to the right hand operand), it has quite a bit to recommend it.
With an asymmetric comparison, another alternative would be to have an explicit threshold value for the reference where it switched from relative to absolute tolerance checking. That is: def is_close_to(value, reference, *, error_ratio=1e-8, near_zero_threshold=1e-6, near_zero_tolerance=1e-14): """Check if the given value is close to a reference value In most cases, the two values are close if 'abs(value-reference) < reference*error_ratio' If abs(reference) < near_zero_threshold, or near_zero_threshold is None, the values are close if 'abs(value-reference) < near_zero_tolerance' """ Setting near_zero_threshold to 0 would force a relative comparison (even near zero), while setting it to None would force an absolute one (even far away from zero). If you look at the default values, this is actually a very similar definition to the one Chris has in PEP 485, as the default near zero threshold is the default error ratio multiplied by the default near zero tolerance, although I'm not sure as to the suitability of those numbers. The difference is that this takes the cutoff point between using a relative error definition (to handle the dynamic range issues of a floating point representation) and an absolute error definition (to handle the instability of relative difference near zero) and *gives it a name*, rather than deriving it from a confusing combination of the reference value, the error ratio and the near zero tolerance.
I believe breaking out the cutoff point as a separately named parameter makes the algorithm easy enough to explain that restricting it isn't necessary. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 27 January 2015 at 14:28, Nick Coghlan <ncoghlan@gmail.com> wrote:
However, in programming terms, are_close(a, b) is_close_to(a, b) is_close_to(b, a) the latter two have the "which is the target" issue. And yes, real code will have more obvious argument names. It's not a huge deal, I agree. I'm just saying that the first form takes less mental effort to parse while reading through a block of code. Enough said. It's not a big deal, someone (not me) ultimately needs to make the decision. I've explained my view so I'll stop.
Agreed. It's a trade-off, and my expectation is that most code will simply use the defaults, so making that read better is a good choice. If you believe that most people will explicitly set a tolerance of some form, the asymmetric choice may well be better.
Eep. All I can say is that I never expect to write code where I'd even consider changing the parameters as documented there. I don't think I could understand the implications well enough to trust my judgement. Remember, my intuition hits its limit at "within 1 millionth of a percent of each other" (that's the 1e-8) or "numbers under 1e-6 have to differ by no more than 1e-14" (the other two). And I'd punt on what might happen if both conditions apply.
The latter choice would make the name "near_zero_tolerance" a pretty odd thing to see...
I'd love to see the proposed documentation, as I think it would probably read as "complicated stuff, leave well alone" to most people. But I *am* assuming a target audience that currently uses "abs(x-y)<1e-8" [1], and unittest's assertAlmostEqual, and doesn't think they need anything different. The rules are different if the target audience is assumed to know more than that. Paul [1] Someone, it may have been Chris or it may have been someone else, used that snippet, and I've seen 1e-8 turn up elsewhere in books on numerical algorithms. I'm not sure why people choose 1e-8 (precision of a C float?), and how it relates to the 1e-6, 1e-8 and 1e-14 you chose for your definition. It feels like the new function may be a lot stricter than the code people naively (or otherwise) write today. Is that fair, or am I reading too much into some arbitrary numbers? (Note - I won't understand the explanation, I'm happy with just "that's a good point" or "no, the numbers ultimately chosen will be fine" :-))

: Disclaimer: I haven't read all of this thread, and what I have read I've sometimes skimmed (and apparently some of the discusion was offlist anyway). On Mon, Jan 26, 2015 at 10:24:07PM -0800, Chris Barker wrote:
Once this exists, at some point somebody's going to write: def near_miss(data, tol): pairs = itertools.combinations(data.values(), 2) return any(is_close(a, b, tol) for a, b in pairs) and then (if it's asymmetric) be very surprised when this: example = {'A': 0.01, 'B': 2.34, 'C': 5.67, 'D': 5.68, 'E': 9.99} near_miss(example, 1/568) returns True half the time and False the other half. That's going to be a really nasty heisenbug when it crops up in real code, especially since it's only going to change for each invocation of the interpreter. I don't really buy the idea that it'll almost always be used with tolerances of 1e-8 etc. If it goes in the stdlib, it'll be used in ways no-one here anticipates (and which are not, intuitively, "wrong"). -[]z. -- Zero Piraeus: absit invidia http://etiol.net/pubkey.asc

On Tue, Jan 27, 2015 at 9:42 AM, Zero Piraeus <schesis@gmail.com> wrote:
Nothing substantial was off-list -- thank god, there's been plenty on list!
huh? itertools combinations returns a deterministic result, yes? But that is indeed a case where the question really is "are these two values close to each-other", with no implied order. IS that a lilkey real world use-case? I have no idea.
I still don't follow you here -- the asymmetric test is order dependent, it's not random.
Indeed -- absolutely the case -- but all we can do is document the behavior. On the face of it, a symmetric approach seems less surprising, but really the surprize is simply different: In the asymmetric case, the results may depend on the order of the arguments. In the symmetric case, the results may depend on whether the actual value is less than or greater than the expected value. I hark back to the point Steven made: what really matter is the difference between the values. if someone is asking: Is this value within 10% of the value 10 -- they are expecting that the difference used will be 10% of ten, or 1.0 -- so any value between 9 and 11 is "close". But with a symmetric test -- the actual difference accepted will be a function of whether the tested value is less than or greater than the expected value -- that could be equally or more surprising than the asymmetry. I'm still looking for a case where a user would likely pass the same value into the function in a different order -- wouldn't s/he pick an order (maybe arbitratily) and use that? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Tue, Jan 27, 2015 at 1:52 PM, Paul Moore <p.f.moore@gmail.com> wrote:
The order of dictionary iteration is arbitrary. Thanks to hash randomization, the order will differ each time the program is run.
duh, of course!. But question remains -- legitimate use case, or pathological example? This kind of points to a is_close_to() and a are are_close() function (better than a flag, yes?) but that really does seem like overkill. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

: On Tue, Jan 27, 2015 at 01:31:25PM -0800, Chris Barker wrote:
Sometimes you *can't* pick an order (as in my dict example), and sometimes your data picks its own order (for example, using all_close() or similar to check the consistency of experimental results). On Tue, Jan 27, 2015 at 02:00:05PM -0800, Chris Barker wrote:
Note that my example was an any_close() function, not an all_close() one. I doubt anyone's seriously going to suggest adding *three* new functions, so anyone who wants something like that will by necessity end up rolling their own. I admit I threw in the unpredictable iteration order of dictionaries to make the result as surprising as I could. There was a point to that ... Some behaviour in Python is surprising, often (as with dict iteration) necessarily so. However, the more surprising behaviour there is in the language, the more likely it is that two instances of that behaviour will interact in ways that are *especially* confusing and hard to debug. However, if you're not convinced by my dictionary shenanigans, here's something more straightforward: def any_close(iterable, tol): pairs = itertools.combinations(iterable, 2) return any(is_close_to(a, b, tol) for a, b in pairs) Notice that results = [2.34, 5.68, 9.99, 5.67, 0.01] ac1 = any_close(results, 1/568) results.sort() ac2 = any_close(results, 1/568) will result in ac1 and ac2 being different. A workflow along the lines of: User 1: - generate data - check sanity [all_close(), any_close(), etc] - normalize [sort] - send to User #2 [or persistent storage] User 2: - recieve from User #1 [or persistent storage] - check sanity [as above] - process data ... is pretty reasonable IMO, and would be adversely affected by the behaviour described above. You might fix that by having any_close() work on a sorted copy of the iterable, or by normalizing before sanity-checking, but that's not necessarily obvious until you get bitten, and it's not that hard to imagine scenarios where neither of those fixes are practical. -[]z. -- Zero Piraeus: post scriptum http://etiol.net/pubkey.asc

On Tue, Jan 27, 2015 at 4:38 PM, Zero Piraeus <schesis@gmail.com> wrote: <snip>
OK -- that is pretty compelling --- I have been looking for use-cases for where having a symmetrical test would be a clear advantage, and this is one. But this is where I'm unclear -- is this "any_close" function something you think is a real use case -- i.e. you have needed such a thing, or think there is a strong chance that you will. Or just something that _could_ be done. Granted, I think your point is that if it _could_ be done, there is a good chance that _someone_ will do it -- and with that in mind we want as few surprising behaviors in the standard lib as possible. But it would be even more compelling if it were a real use case ;-) Anyway, it seems that Steven is going to write up something to clarify the issues, and I'll try to write the various options up in the PEP, and then we can suss out which options are acceptable to most folks. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

: On Tue, Jan 27, 2015 at 10:31:01PM -0800, Chris Barker wrote:
I have needed an any_close() function in the past, but used absolute tolerances, so it's not particularly relevant to the discussion (except maybe that if is_close() existed, I might have used it just because it was there without really thinking things through).
Yep. -[]z. -- Zero Piraeus: pons asinorum http://etiol.net/pubkey.asc

On 27 January 2015 at 15:56, Ron Adam <ron3200@gmail.com> wrote:
Would it help if it was consistent order wise with other python functions such as is_instance?
/me tries to remember the order of arguments for isinstance and fails :-) Paul

On 27/01/2015 16:07, Paul Moore wrote:
Can you remember how to do this?
help(isinstance) Help on built-in function isinstance in module builtins:
isinstance(obj, class_or_tuple, /) Return whether an object is an instance of a class or of a subclass thereof. A tuple, as in ``isinstance(x, (A, B, ...))``, may be given as the target to check against. This is equivalent to ``isinstance(x, A) or isinstance(x, B) or ...`` etc. Or do you need help :) -- My fellow Pythonistas, ask not what our language can do for you, ask what you can do for our language. Mark Lawrence

On 27 January 2015 at 16:19, Mark Lawrence <breamoreboy@yahoo.co.uk> wrote:
Of course, but (a) it was a joke, and (b) I'm not always at a Python prompt when reading (or writing!) Python code. Apologies for being facetious, though. Paul

On 01/27/2015 10:07 AM, Paul Moore wrote:
/me tries to remember the order of arguments for isinstance and fails:-)
I saw where you said this was a bit of a joke. But yes, sometimes it's not all that obvious what order things should be in. Which was your point, and people new to python do have to look up what order the isinstance arguments are in. I'm just wondering if there is a rule of thumb that we can apply that works for other python functions. Cheers, Ron

On 27 January 2015 at 13:08, Steven D'Aprano <steve@pearwood.info> wrote:
Asymmetry is bad, because it is rather surprising and counter-intuitive
Time permitting, over the next day or so I'll draw up some diagrams to
show how each of these tactics change what counts as close or not close.
If you consider the comparsion to be: abs(x-y) <= rel_tol * ref where "ref" is your "reference" value, then all of these are questions about what "ref" is. Possibilities include: * ref = abs(x) (asymmetric version, useful for comparing against a known figure) * ref = max(abs(x),abs(y)) (symmetric version) * ref = abs(x)+abs(y) or (abs(x)+abs(y))/2 (alternate symmetric version) * ref = zero_tol / rel_tol (for comparisons against zero) * ref = abs_tol/rel_tol (for completeness) If you're saying:
your "reference" value is probably really "1.0" or "0.1" since those are the values you're working with, but neither of those values are derivable from the arguments provided to is_close(). Assuming x,y are non-negative and is_close(x,y,rel_tol=r): ref = x: -rx <= y-x <= rx ref = max(x,y): -rx <= y-x <= ry ref = (x+y)/2: -r*(x+y)/2 <= y-x <= r*(x+y)/2 If you set r and x as a constant, then the amounts y can be (below, above) x for the cases above are: rx, rx rx, rx/(1-r) rx/(1+r/2), rx/(1-r/2) Since r>0, 1-r != 1, and 1+r/2 != 1-r/2, so these each give slightly different ranges for a valid y. They're pretty trivial differences though; eg r=1e-8 and x=10 gives: rx = 1e-7 rx/(1-r) = 1.00000001e-07 rx/(1-r/2) = 1.000000005e-07 rx/(1+r/2) = 0.999999995e-07 If you're looking at 10% margins for a nominally 100 Ohm resistor (r=0.1, x=100), that'd translate to deltas of: rx = 10.0 rx/(1-r) = 11.11 rx/(1-r/2) = 10.526 rx/(1+r/2) = 9.524 Having an implementation like: def is_close(a, b=None, tol=1e-8, ref=None): assert (a != 0 and b != 0) or ref is not None if b is None: assert ref is not None b = ref if ref is None: ref = abs(a)+abs(b) return abs(a-b) <= tol*ref might give you the best of all worlds -- it would let you say things like:
is_close(1.0, sum([0.1]*10)) True
is_close(11, ref=10, tol=0.1) True
and get reasonable looking results, I think? (If you want to use an absolute tolerance, you just specify ref=1, tol=abs_tol). An alternative thought: rather than a single "is_close" function, maybe it would make sense for is_close to always be relative, and just provide a separate function for absolute comparisons, ie: def is_close(a, b, tol=1e-8): assert a != 0 and b != 0 # or assert (a==0) == (b==0) return abs(a-b) <= tol*(a+b) def is_close_abs(a,b, tol=1e-8): return abs(a-b) <= tol def is_near_zero(a, tol=1e-8): return abs(a) <= tol Then you'd use is_close() when you wanted something symmetric and easy, and were mopre interested in rough accuracy than absolute precision, and if you wanted to do a 10% resistor check you'd either say: is_close_abs(r, 100, tol=10) or is_near_zero(a-100, tol=10) If you had a sequence of numbers and wanted to do both relative comparisons (first n significant digits match) and absolute comparisons you'd just have to say: for a in nums: assert is_close(a, b) or is_close_abs(a, b) which doesn't seem that onerous. Cheers, aj -- Anthony Towns <aj@erisian.com.au>

This has been very thoughoughly hashed out, so I'll comment on the bits that are new(ish): If you're saying:
this is simple -- don't do that! isclose(1.0, sum([0.1]*10) Is the right thing to do here. and it will work with any of the methods that were ever on the table. If you really want to check for something close to zero, you need an absolute tolerance, not really a reference value (they are kind of the same, but absolute tolerance is easier to reason about: z = 1.0 - sum([0.1]*10)
is_close(0.0, z, abs_tol=1e-12) True
def is_close(a, b=None, tol=1e-8, ref=None):
I think a b-None defaut would be really confusing ot people! Setting an option reference value (I'd probably call it 'scale_val' or something like that might make sense: def isclose(a, b, rel_tol=1e-9, abs_tol=0.0, scale_val=None): Then we use the scale_val if it's defined, and max(a,b) if it's not -- something like: if scale_val is not None: return abs(a-b) <= abs(rel_tol* scale_val) or abs(a-b) <= abs_tol else return abs(a-b) <= abs(rel_tol*a) or abs(a-b) <= abs(rel_tol*b) or abs(a-b) <= abs_tol This would let users do it pretty much anyway they want, while still allowing the most common use case to use all defaults -- if you don't know what the heck scale_val means, then simply ignore it. However, I think that it would require enough thought to use scale_val that users might as well simply write that one line of code themselves. and get reasonable looking results, I think? (If you want to use an
absolute tolerance, you just specify ref=1, tol=abs_tol).
way too much thought required there -- see my version. An alternative thought: rather than a single "is_close" function, maybe it
would make sense for is_close to always be relative,
yes, that came up... If you had a sequence of numbers and wanted to do both relative comparisons
maybe not, but still a bit more onerous than: for a in nums: assert is_close(a, b, abs_tol= 1e-100) (and note that your is_close_abs() would require a tolerance value. But more to the point, if you want to wrap that up in a function (which I'm hoping someone will do for unittest), then that function would need the relative an absolute tolerance levels anyway, so have a different API, less than ideal. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

Is the PEP ready for pronouncement now? -- --Guido van Rossum (python.org/~guido)

On Fri, Feb 13, 2015 at 9:35 AM, Guido van Rossum <guido@python.org> wrote:
Is the PEP ready for pronouncement now?
I think so -- I've addressed (one way or another) everything brought up here. The proposed implementation needs a bit more work, but that's, well, implementation. ;-) -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I see no roadblocks but have run out of time to review the PEP one more time. I'm going on vacation for a week or so, maybe I'll find time, if not I'll start reviewing this around Feb 23. On Fri, Feb 13, 2015 at 10:10 AM, Chris Barker <chris.barker@noaa.gov> wrote:
-- --Guido van Rossum (python.org/~guido)

On Fri, Feb 13, 2015 at 1:55 PM, Guido van Rossum <guido@python.org> wrote:
Sounds good. In the meantime, I welcome type corrections, clarifying text, etc. -Chris
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sun, Jan 25, 2015 at 10:39 PM, Steven D'Aprano <steve@pearwood.info> wrote:
indeed -- I did add the "to" to imply the asymmetric test -- so I say if we go with asymmetric test then IsClose and the IsCloseTo if we go with the asymmetric test.
so the attributes would not be there by default but users could add them if they want: class my_tst(unitest.TestCase): tol = 1e-8 That would work, but seems like a pretty unclear API to me -- is there a precedent in unitest for this already? But I'll leave further discussion on that to other -- I don't like the UnitTest API anyway ;-) I recommend using short names for the two error tolerances, tol and rel,
Isn't that why you set an attribute on your class? but if short, at least rel_tol and abs_tol a plain "tol" could be too confusing (even though I did that in my first draft...) My current draft has rel_tolerance and abs_toelraance -- perhaps a bit too long to type often, but a few people asked for longer, more descriptive names.
Is there? In this discussion , no one had any issue with the proposed approach: result = difference <= rel_tolerance*scaling_value or difference <= abs_tolerance The only issue brought up is that we might want to do it the numpy way for the sake of compatibility with numpy. That's why I didn't add it to my list of issues to resolve.
I was motivated by assertEqual and the various sequence/list methods.
yup -- good to keep that trend going.
Now that I think about it -- we could easily do both. Define a math.is_close_to() in TestCase: @staticmethod def is_close_to(*args, **kwargs): return math.isclose_to(*args, **kwargs) best of both worlds. I"ve got my stand alone function outside unittest, and folks can still override TestCase.is_close_to if they want. I do think there are two distinct use-cases that should be included in
yes, but in an iterative solution you generally compute a solution, then use that to compute a new solution, and you want to know if the new one is significantly different than the previous -- so an asymmetric test does make some sense. But again either work work, and pretty much the same. Example forthcoming.... Like Nick, I think the first is the more important one. In the second
Exactly. I can see we're going to have to argue about the "Close To" versus
"Close" distinction :-)
I think we both understand and agree on the distinction. My take is: - Either will work fine in most instances - The asymmetric one is a bit clearer and maybe better for the testing use-case. - I'd be perfectly happy with either on in the standard library Maybe not consensus, but the majority on this thread seem to prefer the asymmetric test. We could, of course add a flag to turn on the symmetric test (probably the Boost "strong" case), but I'd rather not have more flags, and as you indicate above, the people for whom it matters will probably write their own comparison criteria anyway. It looks like we need to add a bunch of tet to the PEP about incorporating this into unitest -- I'd love it if someone else wrote that -- I'm not much of a unitest user anyway. Pull requests accepted: https://github.com/PythonCHB/close_pep -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 26, 2015 at 9:33 AM, Chris Barker <chris.barker@noaa.gov> wrote:
OK -- here is a real-life application -- probably a bit too much (and too specialized) for the PEP, but for the record: Waves in the ocean have a relationship between the wave length, wave period, water depth (and the acceleration of gravity), known as the dispersion relationship. So-called because the wave speed is a function of the frequency and wave length, and so this relation also predicts the wave speed -- and waves of different frequencies move at different speeds -- os "disperse" as the move from the location they were generated -- but I digress.... The relationship is: omega**2 = g*k*tanh(k*h) where omega is the wave frequence, g is the accelartion of gravity, k is the wave number (2*pi/wave_length), and h is the water depth. In the usual case, the frequencey(omega) is known, and you want the wave number. As the wave number appears in two places, there is no direct solution, so a numerical method must be used. The simplist iterative solution is something like this: recast the equation as: k_2 = omega**2 / (g * tanh(k_1 * h)) - guess a k_1 - compute k_2 - check if k_2 is close to k_1 - if not, set k_1 to k_2 repeat. Here is the code for that: def dispersion(omega, h, g=9.806): "compute the dispersion relation" k_1 = 10.0 # initial guess while True: k_2 = omega**2 / (g * math.tanh(k_1 * h)) if is_close_to(k_2, k_1, tol=1e-5): break k_1 = k_2 return k_1 note that I've provided g to only four significant figures, so I set the tolerance to 1e-5 -- no need for more precision than that. Granted, there are better ways to do this that run faster, and I'm sure there is code out there to do it already (well, I know there is, I wrote some of it...). In fact, I had code very much like this that I used in grad school (not Python, though). Since then, I needed this a lot, and took the time to write up a faster-converging method using Newton's method in C. But frankly this works just fine, and any of the proposals on the table for is_closr_to would work fine with it as well. If I were a student and needed this, and is_close_to ws in the stdlib -- I'd probably use it. In fact, you write a generic iteration routine: def iterate(func, x_initial, *args): """ iterate to find a solution to the function passed in func should be a function that takes x as a first argument, and computes an approximation to x as a result. x_initial is an initial guess for the unknown value """ x_1 = x_initial while True: x_2 = func(x_1, *args) if is_close_to(x_2, x_1): break x_1 = x_2 return x_2 Where you can pass in the function you want to iterate over. Here it is for the above: def disp(k, omega, h, g=9.806): """ the linear wave dispersion relationship k as a function of k, omega, h, g """ return omega**2 / (g * math.tanh(k * h)) and I can then call it like this: k2 = iterate(disp, 10, omega, h ) Code on gitHub if you care. https://github.com/PythonCHB/close_pep/blob/master/iteration_example.py -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 26 January 2015 at 19:42, Chris Barker <chris.barker@noaa.gov> wrote:
OK -- here is a real-life application
[details snipped]
[more details snipped]
The point I was trying to make earlier (which came across as "nobody should be coding their own Newton iterations") is essentially this - surely iterating a relation until the value you need converges is a common operation, and libraries exist that do this for you? And those libraries would already have their own "is the result close enough" calculation (possibly tunable via arguments to the routine). Do people actually write the low-level algorithms by hand, so that a building block like is_close is worthwhile? The lack of people coming up with use cases suggests to me that maybe it isn't. FWIW, a quick google search came up with scipy.optimize.fixed_point, which is, AFAICT, the equivalent of your iterate(). (You probably know better than I do if that's the case). Paul

On Mon, Jan 26, 2015 at 12:52 PM, Paul Moore <p.f.moore@gmail.com> wrote:
[details snipped]
The point I was trying to make earlier (which came across as "nobody should be coding their own Newton iterations")
well, some people should ;-)
sure --
Sure -- scipy.optimize is full of all sorts of optimizers suited to various problems such as these. However, sometimes you have your particular problem and you just want to get an answer without bringing the while dependency of scipy (or whatever) to bear, and figure out how to use the darn thing. -- and which one to use. I really did write that function in grad school -- despite having MATLAB and its optimization package (not to mention the Fortran minpack) available. That being said .. if if we want to say the primary use case is testing that's fine with me -- just please not buried in unittest.TestCase somewhere. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 26, 2015 at 05:10:44PM -0800, Chris Barker wrote:
Apart from doctest, which I think is completely inappropriate, where else would you put this for testing? For testing, I think it needs to be a TestCase.assert* method. Otherwise you have people writing self.assertTrue(close_enough(a, b)) I know this because that's exactly how my assertApproxEqual test assertion started. I'd write tests like that, they would fail, and I'd have no idea why. Fast forward past a few code iterations, and I had an assertion method which give me some useful diagnostics when it failed, e.g.: AssertionError: 20.666666666666668 != 20.66666667 values differ by more than tol=0 and rel=1e-12 -> absolute error = 3.33333360913457e-09 -> relative error = 1.612903358998517e-10 -- Steve

On 29 January 2015 at 13:34, Steven D'Aprano <steve@pearwood.info> wrote:
The only downside is that this doesn't consider other test frameworks like nose and py.test. For those frameworks, you need a standalone function. *If* the intention is to expose a function that people can use for testing, I think it needs to be a standalone close_enough function plus a unittest assert method that uses that function. But I'm fairly sure by now that Guido's right and this should just be a recipe. I can write my own close_enough function now, with the information from this thread. The only bit I'd struggle over is what tolerance to use, and it doesn't look like anyone has a particularly good universal answer for that :-) Paul

But I'm fairly sure by now that Guido's right and this should just be a recipe.
I think the long thread here is more a reflection that it's an interesting problem, and frankly, simple enough that almost anyone has something to say about it. But the fact that there is no one way to do it that is perfect for all cases doesn't mean there isn't one way to do it that is useful in the vast majority of cases. I'm quite sure that most of the options on the table fit that bill, in fact. And I'm optimistic that we can converge on an option that most everyone on this thread can live with.
I can write my own close_enough function now, with the information from this thread.
Sure -- but the idea is that you shouldn't have to. Particularly for unittest.TestCase or quick command line checks. -Chris

math would be a good place. An awful lot of testing is done outside of the unittest module, and indeed, outside of any formal testing at all: command line, quick if __ name__ ... Stanzas, etc.
For testing, I think it needs to be a TestCase.assert* method.
I do think that's a good idea, but it should call the math function. Pretty much like you have in the statistics tests. Though I'm not the one to write that part of the PEP -- I already don't like the unittest API ;-) -Chris

On Mon, Jan 26, 2015 at 10:01 AM, Ron Adam <ron3200@gmail.com> wrote:
Is there any reason why someone would not want to use an asymmetric version in a symmetric way?
You mean they pre-sort the arguments the way they want? Sure, you could do that. Though if it were me, I'd probably just re-write the test myself. But throughout all this -- I haven't thought of a single case where I would prefer a symmetric test and it would matter. It certainly appeals to me as clean, and more similar to equal, etc, but I haven't come up with a practical reason to prefer it. Can anyone? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Mon, Jan 26, 2015 at 10:31 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
Yes -- see the example in the PEP and on this list. If you have a known value, and you want to know whether the value at hand is within some error of the known value, then you want an asymmetric test -- and if you have errors on order of 10% -- it does make a difference. Granted, I don't expect this use case to be common, but it was brought up on this list.
Simplicity has its virtues.
But which is more simple? I came to the conclusion that the asymmetric was simpler to explain and reason about in any case. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 01/26/2015 01:37 PM, Antoine Pitrou wrote:
Correct, it uses either the closer or further from zero value. But it seems to me it's not better than just using the asymmetric one as if it was a symmetric function. The difference is you have both options with the asymmetric version, including alternating the order if you want. Or, a symmetric version can use an average, in which case it can't be used for anything other than relative distance comparisons. (Not relative value comparisons.) Ron

On 01/26/2015 11:37 AM, Antoine Pitrou wrote:
Yes, and nothing is simpler than getting wrong answers -- I can generate those all day long without even trying! ;) One example of when it matters: resistors should have a value of 10, but can be off by 2.5% How would an easy symmetric test handle that? -- ~Ethan~

On Mon, 26 Jan 2015 12:14:04 -0800 Ethan Furman <ethan@stoneleaf.us> wrote:
This is the kind of test that's extremely easy to write by hand. I don't know why you would need the stdlib's help for that. Regards Antoine.

On 26 January 2015 at 11:21, Chris Barker <chris.barker@noaa.gov> wrote:
Since Python 3.2, unittest.assertAlmostEqual has also supported a "delta=value" keyword argument to specify an absolute tolerance directly, rather than using a number of decimal places. It's mutually exclusive with the default "places" argument. I may be missing something, but is there a reason that couldn't be adjusted to also accept a "error_ratio" keyword argument that was mutually exclusive with the other two ("places" and "delta")? With the keyword arguments all being about different ways to specify the error tolerance, I think it would be reasonable to leave that implicit rather than mentioning it in each name. The "places=N" argument could also potentially be adjusted to be a shorthand for "delta=10e-N" rather than its current definition. The mutual exclusion between "error_ratio" and "delta" would require some adjustment to handle values that may be near zero, but it's not clear to me that there's a generally applicable answer to how best to handle that, so it seems advisable to avoid trying to guess. Regards, Nick. P.S. I considered suggesting just "ratio", but that's ambiguous in a way that "delta" isn't: "ratio" could be referring to the ratio of the two numbers, rather than the ratio of the error to the magnitude of the expected value. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 27 January 2015 at 00:08, Nick Coghlan <ncoghlan@gmail.com> wrote:
The "places=N" argument could also potentially be adjusted to be a shorthand for "delta=10e-N" rather than its current definition.
And for folks worried about the slight backwards compatibility break here: "the unittest.TestCase.assertAlmostEqual fuzzy equality function is now slightly less fuzzy if you're using the 'places' argument" is well within the scope of the kind of thing we're prepared to cover in the "Porting Notes" section of the What's New docs. The impact of such a change could also be mitigated by changing the default value so that the default behaviour of the new definition resulted in approximately the same absolute delta as the old definition. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 01/22/2015 04:40 PM, Chris Barker wrote:
After much discussion on this list, I have written up a PEP, and it is ready for review (see below)
Thanks! Very nice.
It is using an asymmetric test
Good - Ron convinced me that was the better way
However, as this approach is not symmetric, a may be within 10% of b, but b is not within x% of a. Consider the case::
Instead of x%, how about 10% ? ;) -- ~Ethan~

Overall I like it, but I'm not sure the help on the tol parameter is clear enough for people who don't already know what they want--in other words, the very people this function should be helping. In my experience, novices understand relative tolerance immediately if you put it in terms of "within X% of expected", but don't always understand it if you put it in terms of "within X * expected" or, worse, "relative to the magnitude of the expected value". Just using % in there somewhere makes people get the concept. Unfortunately, since the API doesn't actually use a percentage--and shouldn't--I'm not sure how to get this across in a one-liner in the help. You can always add something like "(e.g., a relative tolerance of .005 means that the actual value must be within 0.5% of the expected value)", but that's way too verbose. (Also, I should note that the people I've explained this to have mostly been people with a US 1960-1990-style basic math education; I can't be sure that people who learned in another country, or in the post-post-new-math era in the US, etc. will respond the same way, although I do have a bit of anecdotal evidence from helping a few people on forums like StackOverflow that seems to imply they do.) Sent from a random iPhone On Jan 22, 2015, at 16:40, Chris Barker <chris.barker@noaa.gov> wrote:
is the relative tolerance -- it is the amount of error allowed, relative to the magnitude of the expected value.

Andrew, I totally agree that it's not going to be that clear to folks -- but I'm as stumped as you as to how to make it clear without getting really wordy. Also, I think the percent error use case is infrequent, more likely would be that a relative tolerance of 1e-8 means that the numbers are the same to within about 8 significant decimal figures. After all, not many people think in terms of 0.0000001% Suggestions gladly accepted! -Chris On Thu, Jan 22, 2015 at 7:30 PM, Andrew Barnert <abarnert@yahoo.com> wrote:
-- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

I'd use an example with round numbers. "For example, to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-8." On Thursday, January 22, 2015, Chris Barker <chris.barker@noaa.gov> wrote:
-- --Guido van Rossum (on iPad)

On Jan 22, 2015, at 21:54, Guido van Rossum <guido@python.org> wrote:
I'd use an example with round numbers. "For example, to set a tolerance of 5%, pass tol=0.05. The default tolerance is 1e-8."
Hard to beat that for simplicity. +1 on this wording or something similar instead of the current abstract version.

On Thu, Jan 22, 2015 at 04:40:14PM -0800, Chris Barker wrote:
I do not agree that it is ready for review. I think you have rushed to decide that this needs a PEP, rushed the preparation of the PEP, and now you have rushed the request for review. What's the hurry? As it stands with the decisions you have made, I cannot support this PEP even though I support the basic idea. -- Steve

On 01/23/2015 12:06 AM, Steven D'Aprano wrote:
On Thu, Jan 22, 2015 at 04:40:14PM -0800, Chris Barker wrote:
Why? If it has problems, how will he find out about them unless people read it and offer critiques? Or do you not refer to that process as reviewing?
I think you have rushed to decide that this needs a PEP,
He asked if a PEP was needed, and one is. Worst-case scenario we have something to point the next floating-point closeness requester to.
rushed the preparation of the PEP,
With over 100 messages to pull from, how was the preparation rushed? He should have taken a month to write it?
and now you have rushed the request for review.
Um, what? He should have just sat on it for a couple weeks before asking people to look it over? Asking for a review is not the same as asking for a pronouncement; it's not even on python-dev yet.
What's the hurry?
For one, Python 3.5 alpha one is just around the corner, and while there's still time after that the more eyeballs the better; for another, why wait? He has the information he needed, he collected it, made some decisions, and brought it back to the community. Ten days from the first floating point closeness message (14 if you count the float range class thread). A PEP also helps focus the conversation.
As it stands with the decisions you have made, I cannot support this PEP even though I support the basic idea.
Perhaps you feel rushed because you don't like it? -- ~Ethan~

On Fri, Jan 23, 2015 at 12:59:21AM -0800, Ethan Furman wrote:
Ethan, there are factors that you are unaware of because they took place off-list. Since they are private, I will say no more about them except to say that Chris has proceeded as if there is consensus when there actually is not. -- Steven

On Fri, Jan 23, 2015 at 1:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Steven, this appeal to things unmentionable is not an acceptable way to oppose a PEP. In the text you quoted I didn't see Chris claim consensus -- just that he has written up his version. It's ready for review because he wants feedback -- "ready for review" is *not* code for "this is the final word from the community, now the BDFL must speak." Your posts make me worried that we have turned into a political body rather than a group of technical enthusiasts trying to improve the language they all love. I don't think you can reasonably disagree that a PEP is needed -- not with so much discussion and apparently still no agreement. If you oppose the specific proposal, say what you think is wrong with it. If you think it needs more input from other experts, name those experts. If you think it needs more input from a community, name that community. I haven't actually read the PEP, so I don't have an opinion about it (my post last night was just an attempt to reword something quoted in the email thread). I just saw Antoine's response, and at least he talks about the proposal, not the politics around it. But he's awfully vague. We need a concrete counterproposal. Possibly a competing PEP. Anything but references to things that happened off-stage. If you have a personal beef with Chris, this is not the place. -- --Guido van Rossum (python.org/~guido)

On Fri, Jan 23, 2015 at 1:15 AM, Steven D'Aprano <steve@pearwood.info> wrote:
no need for mystery here -- I asked off-list for feedback from Steven and a couple others, then posted the PEP without having given them much time to respond. However, I posted the PEP because I wanted review, and we had had enough circular conversations that I thought is was time for a concrete proposal to bash on. I by no means intended to convey the impression that there was consensus reached among anyone in particular. The goal of posting the PEP was to determine if that was so, and if not, to change it to a point where that could happen. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 23, 2015 at 3:36 AM, Steven D'Aprano <steve@pearwood.info> wrote:
Mmmm... That seemed kind of dogmatic... This thread has been going on for long. I prefer the PEP because it is a concrete proposal. Even if it is rejected, the reasons for the rejection will be documented, so people can be referred to the document instead of spinning this wheel again. Cheers, -- Juancarlo *Añez*

On 23 January 2015 at 00:40, Chris Barker <chris.barker@noaa.gov> wrote:
I'm not sure I follow the specifics but this is saying that everything will be close to zero. Isn't that the wrong way round? I thought the comments in the discussion on the list were saying that the problem with relative tolerance is that *nothing* is close to zero? Paul

On Thu, 22 Jan 2015 16:40:14 -0800 Chris Barker <chris.barker@noaa.gov> wrote:
I don't think the proposal fits the bill. For testing you want a function that is both 1) quite rigorous (i.e. checks equality within a defined number of ulps) 2) handles all special cases in a useful way (i.e. zeros, including distinguishing between positive and negative zeros, infinities, NaNs etc.). As someone who wrote such a function for Numba, what you're proposing would not be a suitable replacement. Regards Antoine.

On Fri, Jan 23, 2015 at 7:36 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
It depends on what you are testing -- I tried to be explicite that this was not intended for testing the accuracy of numerical algorithms, for instance. Rather, it's best use case is testing to see whether you have introduced a big 'ol bug that completely changed your result -- have you got in the ballark. Something similar is in Boost, is in numpy, and n any number of other places. It is clearly useful. THat doesn't mean it has to go in the stdlib, but it is useful in many cases. As for the ulps test -- can you suggest a way to do that, while also providing a simple definition of tolerance that casual users can understand and use (and have a reasonable default? I know I can't. Note that some of the feedback on the PEP as is is that it's too hard to understand already! (without better docs, anyway)
zero, inf, -inf, NaN are all handles, I think correctly. And if -0.0 is not cloe to 0.0, I dont know what is ;-) (there is a test to make sure that's true actually) If you want to make the distinction between -0.0 and 0.0, then you don't want a "close" or "approximate" test.
As someone who wrote such a function for Numba, what you're proposing would not be a suitable replacement.
I never expected it would be a replacement for what is needed for a project like numba. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 23, 2015 at 8:51 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Maybe the confusion here is around the use of "test". To some, that means "unit test" or some other way of testing software. But I hope that's not the main use case. Let's look at Newton's algorithm for computing a square root. It's something like def sqrt(x): new_guess = 1 repeat: guess = new_guess new_guess = avg(guess, x/guess) # Not sure if I've got this right until guess is close enough to new guess return guess This seems a place where a decent "is close enough" definition would help. (Even though this particular algorithm usually converges so rapidly that you can get a result that's correct to within an ulp or so -- other approximations might not.)
Isn't an ulp just a base-2 way of specifying precision scaled so that 1 ulp is the low bit of the mantissa in IEEE fp?
-- --Guido van Rossum (python.org/~guido)

On Fri, Jan 23, 2015 at 5:41 PM, Guido van Rossum <guido@python.org> wrote:
Isn't an ulp just a base-2 way of specifying precision scaled so that 1 ulp is the low bit of the mantissa in IEEE fp?
Basically yes, but there are weird subtleties. E.g. 1 ulp remains the same absolute size between 1.0 and 2.0, so the same ulp threshold can vary by a factor of two in relative precision terms. And where you hit the boundary between exponents funny things happen: 2.0 +/- 1 ulp is [2.0 - 2.2e-16, 2.0 + 4.4e-16]. This can matter if you're looking for high precision -- if the value is supposed to be almost 2.0, then you don't want to get penalized for failing to get 2.0 + 2.2e-16, b/c there is no such number, but it might also be unacceptable to get 2 - 4.4e-16, which would be two values off. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Fri, 23 Jan 2015 08:51:00 -0800 Chris Barker <chris.barker@noaa.gov> wrote:
My approach was roughly: delta = 2 ** (ulps - 53 - 1) * (abs(first) + abs(second)) assertAlmostEqual(first, second, delta=delta) I don't know if it's right in the case of denormals etc. (there's also special code surrounding that to care for zeros, infinities, and NaNs) Regards Antoine.

On 01/23/2015 07:36 AM, Antoine Pitrou wrote:
I disagree -- this function is not meant for mathematicians, but for the non-maths person who needs something that works. Will there be situations where it doesn't work? Certainly. Will they be surprising? Possibly. On the other hand, I was very surprised the first time an bytes object gave me an integer and not a byte.
As someone who wrote such a function for Numba, what you're proposing would not be a suitable replacement.
This isn't for Numba, SciPy, or NumPy. It's to help those who don't use/need those products, but still have some light floating point work to do. -- ~Ethan~

On Fri, 23 Jan 2015 09:12:26 -0800 Ethan Furman <ethan@stoneleaf.us> wrote:
In which use case would a "non-maths person" (what exactly does that mean?) need "something that works"? I haven't seen any serious analysis of use cases. Guido talks about the Newton algorithm but I can't understand why a "non-maths person" would want to write one implementation of that - apart from recreation or educational purposes, that is. Regards Antoine.

On Fri, Jan 23, 2015 at 2:42 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I'll give you a real life example. I never thought of myself as a "maths person," so I guess that makes me a "non-maths person." I am a software engineer. I leave the math to people with PhDs in mathematics, statistics, and engineering. In my day job at a trading firm, I work on automated trading systems. Most models do all their internal calculations using floating point math. At some point though, the desired order prices calculated by the model's indicators need to be converted to actual prices acceptable to the exchange. Floating point numbers being what they are, a computed value will almost never correspond to a valid order price. If your computed price is very close to, but not exactly on a tick boundary and you're not careful, you might erroneously price your order too aggressively or too passively. In these situations you need to recognize when the floating point value you have is within some small tolerance equal to a price on an exact tick boundary. Furthermore, these comparisons need to take into account the different tick sizes of different contracts, The CME's Yen/USD futures contract (6Y) has a tick size (minimum change between two valid prices) of $.000001 while their Euro/USD futures contract (6E) has a tick size of $.0001. In my world, this is done in Python, though the problem arises independent of the language used. It also has nothing to do with the relative sophistication of the math used internal to the model. It is more-or-less just a case of format conversion on output. Skip

On Fri, 23 Jan 2015 15:15:39 -0600 Skip Montanaro <skip.montanaro@gmail.com> wrote:
If you have such a precise requirement (the given tick size), you have to roll your own function, there's no point in a stdlib function, right? Regards Antoine.

On Fri, Jan 23, 2015 at 3:23 PM, Antoine Pitrou <solipsis@pitrou.net> wrote:
If you have such a precise requirement (the given tick size), you have to roll your own function, there's no point in a stdlib function, right?
No, I think Chris's is_close_to would probably do the trick, as the relative tolerance would be some fractional multiple of the tick size. In any case, whether or not I would choose to use this function is beside the point. (It's actually a real, though solved problem in my environment, so modifying code to use it wouldn't be worth the effort or potential sources of bugs at this point.) I was only pointing out that there are valid reasons where such a function might be useful to "non-math people," outside the realm of software testing. Knowing when you need something like this is often only discovered after mistakes are made though. Is a numerical analysis course still commonly taught in Computer Science departments? Skip

On 24 January 2015 at 03:12, Ethan Furman <ethan@stoneleaf.us> wrote:
Note that the key requirement here should be "provide a binary float comparison function that is significantly less wrong than the current 'a == b'". "a == b" is the competition here, not the more correct versions available in other libraries. As far as semantics go, I would expect the new function to be a near drop-in replacement for https://docs.python.org/3/library/unittest.html#unittest.TestCase.assertAlmo... in a testing context. The reason I view the proposal in the PEP as problematic is because it is approaching the problem *like a scientist*, rather than as someone who last studied math in high school. The unittest module definition relies on a very simple set of assumptions: 1. The user understands how arithmetic subtraction works 2. The user understands how decimal rounding works 3. The user understands how absolute deltas work This is a "good enough" answer that handles a wide variety of real world use cases, and is very easy to understand. Most importantly, it provides a hint that when working with floating point numbers, "==" is likely to cause you grief. This simple definition *isn't* really good enough for statistical or scientific use cases, but in those cases you should be using a statistical or scientific computation library with a more sophisticated definition of near equality. Regards, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 23 January 2015 at 00:40, Chris Barker <chris.barker@noaa.gov> wrote:
This section is very weak. As someone who doesn't do numerically intensive computing I would start with the assumption that people who do would have the appropriate tools in packages like numpy, and they would have the knowledge and understanding to use them properly. So my expectation is that this function is intended specifically for non-specialists like me. Based on that, I can't imagine when I'd use this function. You mention testing, but unittest has a function to do this already. Sure, it's tied tightly to unittest, so it's not useful for something like py.test, but that's because unittest is the stdlib testing framework. If you wanted to make that check more widely available, why not simply make it into a full-fledged function rather than an assertion? And if it's not suitable for that purpose, why does this PEP not propose updating the unittest assertion to use the new function? It can't be right to have 2 *different* "nearly equal" functions in the stdlib. Outside of testing, there seems to be no obvious use for the new function. You mention measured values, but what does that mean? "Measure in the length of the line and type in the result, and I'll confirm if it matches the value calculated"? That seems a bit silly. I'd like to see a couple of substantial, properly explained examples that aren't testing and aren't specialist. My worry is that what this function will *actually* be used for is to allow naive users to gloss over their lack of understanding of floating point: n = 0.0 while not is_close_to(n, 1.0): # Because I don't understand floating point do_something_with(n) n += 0.1 BTW, when writing that I had to keep scrolling up to see which order actual and expected went in. I'd imagine plenty of naive users will assume "it's symmetrical so it shouldn't matter" and get the order wrong. In summary - it looks too much like an attractive nuisance to me, and I don't see enough value in it to counteract that. Paul

On Fri, Jan 23, 2015 at 8:05 AM, Paul Moore <p.f.moore@gmail.com> wrote:
I'll see what I can do to strengthen it.
Indeed that is the idea (though there are plenty of specialists using numpy as well ;-) ) Based on that, I can't imagine when I'd use this function. You mention
That would be an option, but I don't think the one in unittest is the right test anyway -- its focus on on number of decimal digits after the decimal place is not generally useful. (that would make some sense for the Decimal type...) And if
it's not suitable for that purpose, why does this PEP not propose updating the unittest assertion to use the new function?
well, for backward compatibility reasons, I had just assumed it was off the table -- or a long, painful road anyway. And the unitest is very vested in it's OO structure -- would we want add free-form functions to it?
It can't be right to have 2 *different* "nearly equal" functions in the stdlib.
Well, they do have a different functionality -- maybe some people really do want the decimal digits thing. I'm not sure we'd want one function with a whole bunch of different ways to call it -- maybe we would, but having different functions seems fine to me.
This came up in examples in the discussion thread -- I'd don't think I would use it that way myself, so I'm going to leave it to others to suggest better examples or wording. Otherwise, I'll probably take it out. I'd like to see a couple of substantial, properly explained examples
that aren't testing and aren't specialist.
In practice, I think testing is the biggest use case, but not necessarily formal unit testing. That's certainly how I would use it (and the use case that prompted me to start this whole thread to begin with..). I'll look in my code to see if I use it other ways, and I'm open to any other examples anyone might have. But maybe it should be with testing code in that case -- but I don't see any free-form testing utility functions in there now. Maybe it should go in unitest.util ? I'd rather not, but it's just a different import line.
Is that necessarily worse? it would at least terminate ;-) floating point is a bit of an attractive nuisance anyway.
Well, I think the biggest real issue about this (other than should it be in the stdlib at all) is the question of a symmetrical vs. symmetrical test. I decided to go (for this draft, anyway) with the asymmetric test, as it is better defined and easier to reason about, and more appropriate for some cases. And the biggest argument for a symmetric test is that it is what people would expect. So I tried to make the parameter names that would make it clear (rather than a,b or x,y) -- I think I failed on that, however -- anyone have a better suggestion for names? It turns out "actual" is far too similar in meaning to "expected". In summary - it looks too much like an attractive nuisance to me, If it's not there, the folks will cobble somethign up themselves (and I'm sure do, all the time). If they know what they are doing, and take care, then great, but if not then they may get something with worse behavior that this. Maybe they will at least understand it better, but I suspect the pitfalls will all still be there in a typical case. And in any case, have to take the time to write it. That's my logic anyway. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Fri, Jan 23, 2015 at 9:21 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Indeed that is the idea (though there are plenty of specialists using numpy as well ;-) )
uhm, non-specialists, that is. In fact, the one in numpy is more susceptible to miss-use. On the other hand, it's there, and it's useful, and works most of the time. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On Sat, Jan 24, 2015 at 4:21 AM, Chris Barker <chris.barker@noaa.gov> wrote:
Updating the assertion to use the new function would be a matter of tweaking the implementation of unittest's assertAlmostEqual() to now call this function and assert that it returns True. The OO structure of unittest wouldn't be affected; just the exact definition of one particular assertion. I'd say that's a point worth mentioning in the PEP. Conceptually, this is going to do the same thing; yes, it's a change of definition, but obviously this won't be done in a point release anyway. It would make reasonable sense to sync them up. Alternatively, if you choose not to have that as part of the proposal, it would be worth adding a word or two of docs to unittest stating that assertAlmostEqual is not the same as is_close_to (and/or add "assertCloseTo" which would use it), as the existing implementation is all about absolute difference. ChrisA

On Fri, Jan 23, 2015 at 5:45 PM, Chris Angelico <rosuav@gmail.com> wrote:
Yeah, having just taken a quick look at the source, I'd go so far as to say assertAlmostEqual is almost totally broken. I had to read the docs three times to work out that while it sorta sounds like it provides relative tolerances, it actually doesn't at all -- places=3 means something like abs_tol=10**-3. Not really appropriate for numerical work. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Fri, Jan 23, 2015 at 9:45 AM, Chris Angelico <rosuav@gmail.com> wrote:
sure -- that's not quite what I meant. I was really addressing the "where would this sit" question. unittest does not currently have any stand-alone utility functions for testing in it. If we put this there, would anyone think to look for it there?
I'd say that's a point worth mentioning in the PEP.
well, whether to change a TestCase assertion or add a new one is a brand new question -- we could add that to this PEP if people think that's a good idea. For my part, I find unittest painful, and use py.test (and sometimes nose) anyway....
probably a good idea, yes. I really don't think we want to change assertAlmostEqual -- certainly not anytime soon. It seems like gratuitous backward incompatibility. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov

On 23 January 2015 at 17:21, Chris Barker <chris.barker@noaa.gov> wrote:
Than understanding what you're doing? Yes. But it's sort of my point that fp is prone to people misunderstanding, and it's a shame to give people more opportunities.
Your parameter names and documentation are fine - it's very obvious how to use the function when you look. It's just that you *need* to look because an asymmetric check isn't immediately intuitive. I say "immediately" because when I think about it yes, the question "is a close enough to b?" is actually asymmetric.
Yeah, you have a point. And TBH, I can ignore this function just as easily as I currently ignore cmath.sin, so it's no big deal. Guido's example of Newton iteration is a good use case (although most of the time I'd expect to use a prebuilt function from a module, rather than build it myself with Newton iteration, but maybe that just reflects the fact that I don't do numerical programming). Paul

On Fri, Jan 23, 2015 at 9:59 AM, Paul Moore <p.f.moore@gmail.com> wrote:
Well duh. Any algorithm that isn't already in the math module would require way too much code. The point of the example is that most people have probably seen that algorithm before, and it's only one simple step, really, so they won't be distracted by trying to understand the algorithm when the point of the example is to show how you would use is_close_to(). (And it's one of the simplest algorithms that gives an *approximation*, not an exact answer, at least not in the mathematical sense, which is also important in this case -- if the algorithm was exact there would be no need to use is_close_to().) -- --Guido van Rossum (python.org/~guido)

On 23 January 2015 at 18:10, Guido van Rossum <guido@python.org> wrote:
Sorry. What I was trying to say is that if I had a need for say a Bessel function, or numerical integration, or a zero of a function, I'd go hunting for a package that implemented it (something like mpmath, maybe) rather than rolling my own numerical algorithm using is_close_to(). But I do agree, that implementing numerical algorithms is a good use of is_close_to. And your example was fine, it'd make a good addition to use cases in the PEP. (But I wonder - wouldn't it work better with a "symmetrical" close-to function? That's probably a question for Chris.) Paul

Well, you usually use Newton's algorithm to find the zero of a function, so in that case, you'd want an absolute comparison. But it's pretty common to do a simple iterative solution where you check convergence by seeing if the new solution is close to the previous solution, in which case, a symmetric test would probability be better, but the asymmetric one would be fine -- you'd be asking the question -- is the new solution close to the previous one? -Chris

Guido van Rossum writes:
the point of the [Newton's method] example is to show how you would use is_close_to().
Except that this clearly is a Cauchy test, the algorithm doesn't know the limit. In principle, the appropriate computation would be symmetric. I don't think this is a problem in practice[1], but Skip's "straddling the tick" example is much stronger for an asymmetric comparison function. On the other hand, Skip's case requires an absolute comparison, not a relative one. The whole discussion has been really fast and loose about use cases. People with strong preferences can't seem to wrap their heads around others' use cases, examples poorly matched to the proposals are common, the expertise of the numerical experts seems irrelevant because we *don't* want accuracy even in corner cases, we just want to make it easier for naive users to avoid writing "x == y". ISTM that this PEP can be reduced to We need a floating comparison function that's good enough for government work, to help naive users avoid writing "x == y" for floating point comparisons. There are use cases where one of the values is a known accurate value, so the comparison function is asymmetric. This generally won't get things "too wrong" for symmetric comparisons, except where a relative comparison involves true values near zero. Unfortunately, not much can be done in that case because it requires enough domain knowledge to realize that true values near zero occur and that this is a problem, so use of this function is covered by "consenting adults".[2] And oh yeah, IANAEINA.[3] But for this PEP, I don't need to be. <wink/> Footnotes: [1] I've reconsidered. A priori, I still like symmetric errors better in general, but the target audience for this function isn't going to be reasoning about equivalence classes of IEEE 754 floats. [2] As is all use of floating point. [3] I am not an expert in numerical analysis. Which IIUC applies to the PEP author as well as to this poster.

On Fri, Jan 23, 2015 at 12:40 AM, Chris Barker <chris.barker@noaa.gov> wrote:
I might phrase this a bit more strongly -- assertAlmostEqual is confusing and broken-by-default for common cases like comparing two small values, or comparing two large values.
So for reference, it looks like the differences from numpy are: 1) kwarg names: "tol" and "abs_tol" versus "atol", "rtol". Numpy's names seem fine to me, but if you want the longer ones then probably "rel_tol", "abs_tol" would be better? 2) use of max() instead of + to combine the relative and absolute tolerance. I understand that you find the + conceptually offensive, but I'm not really sure why -- max() is maybe a bit better, but it seems like much of a muchness to me in practice. (Sure, like you say further down, the total error using + might end up being higher by a factor of two or so -- but either people are specifying the tolerances they want, in which case they can say what they mean either way, or else they're just accepting the defaults, in which case they don't care.) It might be worth switching to + just for compatibility. 3) The default tolerances. Numpy is inconsistent with itself on this point though (allclose vs. assert_allclose), so I wouldn't worry about it too much :-). However, a lot of the benefit of numpy.allclose is that it will do something mostly-reasonable out-of-the-box even if the users haven't thought things through at all. 99% of the benefit of having something like this available is that it makes it easy to write tests, and 99% of the benefit of a test is that it exists and makes sure that your values are not wildly incorrect. So that's nice. BUT if you want that kind of out-of-the-box utility then you need to have some kind of sensible default for comparisons to zero. (I just did a quick look at uses of python code uses of assertAlmostEqual on github, and in my unscientific survey of reading the first page of results, 30.4% of the calls were comparisons against zero. IMO asking all these people to specify tolerances by hand on every call is not very nice.) One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0. [And a nice possible side-effect of this is that numpy could conceivably then add such an argument as well "for compatibility with the stdlib", and possibly use this as a lever to fix it's weird allclose/assert_allclose discrepancy. The main blocker to making them consistent is that there is lots of code in the wild that assumes allclose handles comparisons-to-zeros right, and also lots of code that assumes that assert_allclose is strict with very-small non-zero numbers, and with only rtol and atol you can't get both of these behaviours simultaneously.]
I'd strongly consider expanding the scope of this PEP a bit so that it's proposing both a relative/absolute-error-based function *and* a ULP-difference function. There was a plausible-looking one using struct posted in the other thread, it would cover a wider variety of cases, and having both functions next to each other in the docs would provide a good opportunity to explain why the differences and which might be preferred in which situation. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org

On Sat, Jan 24, 2015 at 8:30 AM, Nathaniel Smith <njs@pobox.com> wrote:
Longer names preferable. It was quite a long way into the original thread before I understood what "atol" meant - my brain kept wanting it to be related to the atoi family of functions from C (ASCII to Integer (atoi), ASCII to Long (atol), etc, converting strings to integers). ChrisA

Longer names preferable.
I had a suggestion on github for the same thing -- how about: rel_tolerance and abs_tolerance ?
Not all of us are as contaminated by C ;-) in fact, when I see the C functions I first think of tolerances... Long clear names are good. -Chris

On Fri, Jan 23, 2015 at 4:30 PM, Nathaniel Smith <njs@pobox.com> wrote:
Many style guides recommend against using _ to separate abbreviated words in variable names, so either relative_/absolute_tolerance or reltol/abstol. OTOH, I don't see any problem with numpy's atol/rtol.

On 01/23/2015 01:30 PM, Nathaniel Smith wrote:
On Fri, Jan 23, 2015 at 12:40 AM, Chris Barker wrote:
Longer names are good for us non-maths folks. ;) rel_tol and abs_tol look good to me.
That makes no sense to me. I'm not sure taking the max does either, though, as phrases like "you can be off by 5% or 30 units, whichever is [smaller | greater]" comes to mind.
One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0.
Seems reasonable.
Also seems reasonable. So, in the interest of keeping things insane ;) how about this signature? def close_to(noi, target, min_tol, max_tol, rel_tol, zero_tol): """ returns True if noi is within tolerance of target noi: Number Of Interest - result of calulations target: the number we are trying to get min_tol: used with rel_tol to determine actual tolerance max_tol: used with rel_tol to determine actual tolerance zero_tol: an absolute tolerance if target == 0 (otherwise rel_tol is used an as zero_tol) """ -- ~Ethan~

On Friday, January 23, 2015 1:31 PM, Nathaniel Smith <njs@pobox.com> wrote:
If you're thinking about the post I think you are (mine), I wouldn't suggest using that. The main issue is that was a bits-difference function, not an ulps-difference function--in other words, bits_difference(x, y) is the number of times you have to do y = nexttoward(y, x) to get y == x. In the case where the difference is <= 2, or where x and y are finite numbers with the same sign and exponent, they happen to be the same, but otherwise, they don't. For example, consider x as the 5th largest number with one exponent, and y as the 5th smallest number with the next. They're 10 bits away, but 12.5 ulp(x) away and 7.5 ulp(y) away. Most algorithms that you want to test for ulp difference are specified to be within 0.5, 1, or 2 ulp, and for C lib functions it's always 0.5 or 1 (except pow in certain cases), or to no more than double the ulp difference--but definitely not _all_, so it would be misleading to offer a bits-difference function as an ulps-difference function. Secondarily, even as a bit-difference function, what I posted isn't complete (but I think the version in https://github.com/abarnert/floatextras is), and makes various decisions and assumptions that aren't necessarily the only option. Also, there's nothing else in the stdlib that directly accesses the bits of a float in Python, which seems a little weird. Finally, neither Python nor the C89 standard that CPython implies require that float actually be an IEEE 754-1985 double (much less an IEEE 754-2005 binary64, the later standard I actually have a copy of...). In particular, sys.float_info doesn't assume it. I think if we wanted this, we'd want to implement nexttoward in C (by calling the C99/POSIX2001 function if present, and maybe our own bit-twiddling-IEEE-doubles-in-C implementation for Windows, but it's not there otherwise), then define ulp (in C or Python) in terms of nexttoward, then define ulp_difference(x, y) (ditto) in terms of ulp(y). This does require a bit of care to make sure that, e.g., ulp_difference(float_info.max, inf) comes out as 1 or as an error, whichever one you want, and so on. (That means it also requires deciding what to do for each edge case, since they're not standardized by IEEE 754-1985, IEEE 754-2008, C99, or POSIX2001.) This would work correctly and consistently on almost every *nix platform (even some that don't use IEEE double) and on Windows, and wouldn't exist on platforms where it won't work correctly. Of course other implementations would have to come up with some other compatible implementation, but at least Java has an ulp function, and if .NET doesn't, it can probably make assumptions about the underlying platform. If we also want a bits_difference function in the stdlib (and I'm not sure we do), I'd suggest also writing that in C, by pointer-casting from double to int64_t and using the information in C99 math.h/limits.h (and again maybe special-casing Windows), rather than twiddling IEEE bits in Python.

The next response makes it clear why I think that's out of scope for this proposal -- it is considerably harder for casual users to wrap their brains around, so I think if such a thing exists, it should probably be a different function. Not that two functions can't be in the same PEP. But in any case, I'm not qualified to write it ( certainly not the code, but not really the PEP either) If someone else wants to champion that part, I'm happy to work together however makes sense. -Chris

2) use of max() instead of + to combine the relative and absolute tolerance.
In fact, the code uses "or", but in amounts to the same thing -- if the difference is within either the relative or absolute tolerance, it's "close".
Actually I agree with you here -- I think I've said elsewhere that I expect in practice people will set their tolerance to an order of magnitude, so even a factor of two doesn't much matter. But I see no advantage to doing it that way (except perhaps as a vectorized computation, which this is not)
It might be worth switching to + just for compatibility.
Well, the other difference is that numpy's version sets a default non-zero absolute tolerance. I think this is fatally wrong. Way too easy to get something really wrong for small values. Once we've done something incompatible, why not make it cleaner? And I see little reason for compatability for it's own sake.
I spent some time thinking about this, and my first version did have a default abs_tol to cover the near-zero case. But it would be absolutely the wrong thing for comparing small values. If you can think of defaults and an algorithm that would work well for large and small values and also comparison to zero, I'd be all for it.
Hmm -- my thinking is that at least those tests would immediately not work, but agreed, nicer for defaults to work for common cases.
One option would be to add a zero_tol argument, which is an absolute tolerance that is only applied if expected == 0.
Here is where I'm not sure: is there only an issue with comparing to exactly zero? Or can vet small numbers under flow and cause the same problem?
I'm not sure it's much of an incentive for ghe Stalin, but sure, that would be nice.
I responded to this elsewhere. Thanks for your input. -Chris
participants (21)
-
Alexander Belopolsky
-
Andrew Barnert
-
Anthony Towns
-
Antoine Pitrou
-
Chris Angelico
-
Chris Barker
-
Chris Barker - NOAA Federal
-
Emile van Sebille
-
Eric V. Smith
-
Ethan Furman
-
Guido van Rossum
-
Juancarlo Añez
-
Mark Lawrence
-
Nathaniel Smith
-
Nick Coghlan
-
Paul Moore
-
Ron Adam
-
Skip Montanaro
-
Stephen J. Turnbull
-
Steven D'Aprano
-
Zero Piraeus