Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail: np.testing.assert_allclose(0, 1e-14) Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior. https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6... It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. Thanks, -Tony
On 16 Jul 2014 10:26, "Tony Yu" <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and `assert_allclose`
differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I
guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
It seems like the defaults for `allclose` and `assert_allclose` should
match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal. What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-) -n
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 16 Jul 2014 10:26, "Tony Yu" <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and
`assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I
guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
It seems like the defaults for `allclose` and `assert_allclose` should
match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-)
If you mean by this to add atol=1e-8 as default, then I'm against it. At least it will change the meaning of many of our tests in statsmodels. I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0. If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8. Josef
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd@gmail.com> wrote:
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 16 Jul 2014 10:26, "Tony Yu" <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and
`assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I
guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
It seems like the defaults for `allclose` and `assert_allclose` should
match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-)
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in statsmodels.
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
Just to explain, p-values, sf of the distributions are usually accurate at 1e-30 or 1e-50 or something like that. And when we test the tails of the distributions we use that the relative error is small and the absolute error is "tiny". We would need to do a grep to see how many cases there actually are in scipy and statsmodels, before we change it because for some use cases we only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization). Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR. Josef
Josef
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jul 17, 2014 at 4:21 PM, <josef.pktd@gmail.com> wrote:
On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd@gmail.com> wrote:
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 16 Jul 2014 10:26, "Tony Yu" <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and
`assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
And one more comment: I debug "broken tests" pretty often. My favorites in pdb are np.max(np.abs(x - y)) and np.max(np.abs(x / y - 1)) to see how much I would have to adjust atol and rtol in assert_allclose in the tests to make them pass, and to decide whether this is an acceptable numerical difference or a bug. allclose doesn't tell me anything and I almost never use it. Josef
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I
guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
It seems like the defaults for `allclose` and `assert_allclose`
should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-)
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in statsmodels.
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
Just to explain, p-values, sf of the distributions are usually accurate at 1e-30 or 1e-50 or something like that. And when we test the tails of the distributions we use that the relative error is small and the absolute error is "tiny".
We would need to do a grep to see how many cases there actually are in scipy and statsmodels, before we change it because for some use cases we only get atol 1e-5 or 1e-7 (e.g. nonlinear optimization). Linear algebra is usually atol or rtol 1e-11 to 1e-14 in my cases, AFAIR.
Josef
Josef
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Jul 17, 2014 at 9:07 PM, <josef.pktd@gmail.com> wrote:
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith <njs@pobox.com> wrote:
What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-)
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in statsmodels.
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*. Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to if desired == 0: tol = atol else: tol = rtol * desired In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd@gmail.com> wrote:
On Wed, Jul 16, 2014 at 9:52 AM, Nathaniel Smith <njs@pobox.com> wrote:
On 16 Jul 2014 10:26, "Tony Yu" <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and
`assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I
guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
It seems like the defaults for `allclose` and `assert_allclose` should
match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
What you say makes sense to me, and loosening the default tolerances won't break any existing tests. (And I'm not too worried about people who were counting on getting 1e-7 instead of 1e-5 or whatever... if it matters that much to you exactly what tolerance you test, you should be setting the tolerance explicitly!) I vote that unless someone comes up with some terrible objection in the next few days then you should submit a PR :-)
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in statsmodels.
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*. Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to if desired == 0: tol = atol else: tol = rtol * desired In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30. -n That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any. assert_allclose is one of our (statsmodels) most frequently used numpy function this is not informative: `np.allclose(x, 1e-30)` since there are keywords either np.assert_allclose(x, atol=1e-30) if I want to be "close" to zero or np.assert_allclose(x, rtol=1e-11, atol=1e-25) if we have a mix of large numbers and "zeros" in an array. Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata. atol=1e-8 is not close to zero in most cases in my experience. The numpy.testing assert functions are some of the most useful functions in numpy, and heavily used "code". Josef
Josef
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Jul 18, 2014 at 12:38 PM, <josef.pktd@gmail.com> wrote:
On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd@gmail.com> wrote:
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in statsmodels.
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a lot of work. I think I almost never use a default rtol, but I often leave atol at the default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here
This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*.
Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to
if desired == 0: tol = atol else: tol = rtol * desired
In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
-n
That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any.
I wrote allclose because it's shorter, but my point is that assert_allclose and allclose should use the same criterion, and was making a suggestion for what that shared criterion might be.
assert_allclose is one of our (statsmodels) most frequently used numpy function
this is not informative:
`np.allclose(x, 1e-30)`
since there are keywords either np.assert_allclose(x, atol=1e-30)
I think we might be talking past each other here -- 1e-30 here is my "gold" p-value that I'm hoping x will match, not a tolerance argument.
if I want to be "close" to zero or
np.assert_allclose(x, rtol=1e-11, atol=1e-25)
if we have a mix of large numbers and "zeros" in an array.
Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata.
I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails?
atol=1e-8 is not close to zero in most cases in my experience.
If I understand correctly (Tony?) the problem here is that another common use case for assert_allclose is in cases like assert_allclose(np.sin(some * complex ** calculation / (that - should - be * zero)), 0) For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than "normal" floating point error. These calculations usually involve "normal" sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Fri, Jul 18, 2014 at 9:53 AM, Nathaniel Smith <njs@pobox.com> wrote:
I don't know what the usecases for np.allclose are since I don't have any.
I use it all the time -- sometimes you want to check something, but not raise an assertion -- and I use it like: assert np.allclose() with pytest, because it does some nice failure reporting that way (though maybe because I just landed on that). Though I have to say I"m very surprised that assert_allclose() doesn't simpily call allclose() to do it's work, and having different default is really really bad. but that cat's out of the bag. If we don't normalize these, we should put nice strong notes in the docs for both that they are NOT the same. -CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 18, 2014 at 12:38 PM, <josef.pktd@gmail.com> wrote:
On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd@gmail.com> wrote:
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in statsmodels.
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a
lot
of work. I think I almost never use a default rtol, but I often leave atol at the default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here
This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*.
Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to
if desired == 0: tol = atol else: tol = rtol * desired
In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
-n
That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any.
I wrote allclose because it's shorter, but my point is that assert_allclose and allclose should use the same criterion, and was making a suggestion for what that shared criterion might be.
assert_allclose is one of our (statsmodels) most frequently used numpy function
this is not informative:
`np.allclose(x, 1e-30)`
since there are keywords either np.assert_allclose(x, atol=1e-30)
I think we might be talking past each other here -- 1e-30 here is my "gold" p-value that I'm hoping x will match, not a tolerance argument.
my mistake
if I want to be "close" to zero or
np.assert_allclose(x, rtol=1e-11, atol=1e-25)
if we have a mix of large numbers and "zeros" in an array.
Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata.
I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails?
atol=1e-8 is not close to zero in most cases in my experience.
If I understand correctly (Tony?) the problem here is that another common use case for assert_allclose is in cases like
assert_allclose(np.sin(some * complex ** calculation / (that - should - be * zero)), 0)
For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than "normal" floating point error. These calculations usually involve "normal" sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases.
my problem is that there is no "normal" floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing. this test two different functions/methods that calculate the same thing (Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25) I don't care about errors that are smaller that 1e-25 for example testing p-values against Stata (Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol) I could find a lot more and maybe nicer examples, since I spend quite a bit of time fine tuning unit tests. Of course you can change it. But the testing functions are code and very popular code. And if you break backwards compatibility, then I wouldn't mind reviewing a pull request for statsmodels that adds 300 to 400 `atol=0` to the unit tests. :) Josef
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Jul 18, 2014 at 2:03 PM, <josef.pktd@gmail.com> wrote:
On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 18, 2014 at 12:38 PM, <josef.pktd@gmail.com> wrote:
On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd@gmail.com> wrote:
If you mean by this to add atol=1e-8 as default, then I'm against it.
At least it will change the meaning of many of our tests in
I'm using rtol to check for correct 1e-15 or 1e-30, which would be completely swamped if you change the default atol=0. Adding atol=0 to all assert_allclose that currently use only rtol is a
lot
of work. I think I almost never use a default rtol, but I often leave atol at
statsmodels. the
default = 0.
If we have zeros, then I don't think it's too much work to decide whether this should be atol=1e-20, or 1e-8.
copied from http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html since I didn't get any messages here
This is a compelling use-case, but there are also lots of compelling usecases that want some non-zero atol (i.e., comparing stuff to 0). Saying that allclose is for one of those use cases and assert_allclose is for the other is... not a very felicitious API design, I think. So we really should do *something*.
Are there really any cases where you want non-zero atol= that don't involve comparing something against a 'desired' value of zero? It's a little wacky, but I'm wondering if we ought to change the rule (for all versions of allclose) to
if desired == 0: tol = atol else: tol = rtol * desired
In particular, means that np.allclose(x, 1e-30) would reject x values of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
-n
That's much too confusing. I don't know what the usecases for np.allclose are since I don't have any.
I wrote allclose because it's shorter, but my point is that assert_allclose and allclose should use the same criterion, and was making a suggestion for what that shared criterion might be.
assert_allclose is one of our (statsmodels) most frequently used numpy function
this is not informative:
`np.allclose(x, 1e-30)`
since there are keywords either np.assert_allclose(x, atol=1e-30)
I think we might be talking past each other here -- 1e-30 here is my "gold" p-value that I'm hoping x will match, not a tolerance argument.
my mistake
if I want to be "close" to zero or
np.assert_allclose(x, rtol=1e-11, atol=1e-25)
if we have a mix of large numbers and "zeros" in an array.
Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata.
I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails?
atol=1e-8 is not close to zero in most cases in my experience.
If I understand correctly (Tony?) the problem here is that another common use case for assert_allclose is in cases like
assert_allclose(np.sin(some * complex ** calculation / (that - should - be * zero)), 0)
For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than "normal" floating point error. These calculations usually involve "normal" sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases.
my problem is that there is no "normal" floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing.
this test two different functions/methods that calculate the same thing
(Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
I don't care about errors that are smaller that 1e-25
for example testing p-values against Stata
(Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)
I could find a lot more and maybe nicer examples, since I spend quite a bit of time fine tuning unit tests.
Of course you can change it.
But the testing functions are code and very popular code.
And if you break backwards compatibility, then I wouldn't mind reviewing a pull request for statsmodels that adds 300 to 400 `atol=0` to the unit tests. :)
scipy (not current master) doesn't look "so" bad. I find 400 "assert_allclose(" and maybe a third to half use atol. As expected optimize uses only atol because of the convergence criteria. scipy.stats uses mostly rtol or default. Josef
Josef
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Fri, Jul 18, 2014 at 7:03 PM, <josef.pktd@gmail.com> wrote:
On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than "normal" floating point error. These calculations usually involve "normal" sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases.
my problem is that there is no "normal" floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing.
this test two different functions/methods that calculate the same thing
(Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
I don't care about errors that are smaller that 1e-25
for example testing p-values against Stata
(Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)
I could find a lot more and maybe nicer examples, since I spend quite a bit of time fine tuning unit tests.
...these are all cases where there are not exact zeros, so my proposal would not affect them? I can see the argument that we shouldn't provide any default rtol/atol at all because there is no good default, but... I don't think putting that big of a barrier in front of newbies writing their first tests is a good idea. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Fri, Jul 18, 2014 at 2:29 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Fri, Jul 18, 2014 at 7:03 PM, <josef.pktd@gmail.com> wrote:
On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
For cases like this, you need *some* non-zero atol or the thing just doesn't work, and one could quibble over the exact value as long as it's larger than "normal" floating point error. These calculations usually involve "normal" sized numbers, so atol should be comparable to eps * these values. eps is 2e-16, so atol=1e-8 works for values up to around 1e8, which is a plausible upper bound for where people might expect assert_allclose to just work. I'm trying to figure out some way to support your use cases while also supporting other use cases.
my problem is that there is no "normal" floating point error. If I have units in 1000 or units in 0.0001 depends on the example and dataset that we use for testing.
this test two different functions/methods that calculate the same thing
(Pdb) pval array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) res2.pvalues array([ 3.01270184e-42, 5.90847367e-02, 3.00066946e-12]) (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
I don't care about errors that are smaller that 1e-25
for example testing p-values against Stata
(Pdb) tt.pvalue array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) res2.pvalues array([ 5.70315140e-30, 6.24662551e-02, 5.86024090e-11]) (Pdb) tt.pvalue - res2.pvalues array([ 2.16612016e-40, 2.51187959e-15, 4.30027936e-21]) (Pdb) tt.pvalue / res2.pvalues - 1 array([ 3.79811738e-11, 4.01900735e-14, 7.33806349e-11]) (Pdb) rtol 1e-10 (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)
I could find a lot more and maybe nicer examples, since I spend quite a
bit
of time fine tuning unit tests.
...these are all cases where there are not exact zeros, so my proposal would not affect them?
I can see the argument that we shouldn't provide any default rtol/atol at all because there is no good default, but... I don't think putting that big of a barrier in front of newbies writing their first tests is a good idea.
I think atol=0 is **very** good for newbies, and everyone else. If expected is really zero or very small, then it immediately causes a test failure, and it's relatively obvious how to fix it. I worry a lot more about unit tests that don't "bite" written by newbies or not so newbies who just use a default. That's one of the problems we had with assert_almost_equal, and why I was very happy to switch to assert_allclose with it's emphasis on relative tolerance. Josef
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
18.07.2014 21:03, josef.pktd@gmail.com kirjoitti: [clip]
Of course you can change it.
But the testing functions are code and very popular code.
And if you break backwards compatibility, then I wouldn't mind reviewing a pull request for statsmodels that adds 300 to 400 `atol=0` to the unit tests. :)
10c: Scipy has 960 of those, and atol ~ 0 is required in some cases (difficult to say in how big percentage without review). The default of atol=1e-8 is pretty large. There's ~60 instances of allclose(), most of which are in tests. About half of those don't have atol=, whereas most have rtol. Using allclose in non-test code without specifying both tolerances explicitly is IMHO a sign of sloppiness, as the default tolerances are both pretty big (and atol != 0 is not scale-free). *** Consistency would be nice, especially in not having traps like assert_allclose(a, b, eps) -> assert_(not np.allclose(a, b, eps)) Bumping the tolerances in assert_allclose() up to match allclose() will probably not break code, but it can render some tests ineffective. If the change is made, it needs to be noted in the release notes. I think the number of project authors who relied on that the default was atol=0 is not so big. (In other news, we should discourage use of assert_almost_equal, by telling people to use assert_allclose instead in the docstring at the least. It has only atol= and it specifies it in a very cumbersome log10 basis...) -- Pauli Virtanen
On Fri, Jul 18, 2014 at 11:47 AM, Pauli Virtanen <pav@iki.fi> wrote:
Using allclose in non-test code without specifying both tolerances explicitly is IMHO a sign of sloppiness, as the default tolerances are both pretty big (and atol != 0 is not scale-free).
using it without specifying tolerances is sloppy in ANY use case. Bumping the tolerances in assert_allclose() up to match allclose() will
probably not break code, but it can render some tests ineffective.
being a bit pedantic here, but rendering a test ineffective IS breaking code. And I'd rather a change break my tests than render them ineffective -- if they break, I'll go look at them. If they are rendered ineffective, I'll never notice. Curious here -- is atol necessary for anything OTHER than near zero? I can see that in a given case, you may know exactly what range of values to expect (and everything in the array is of the same order of magnitude), but an appropriate rtol would work there too. If only zero testing is needed, then atol=0 makes sense as a default. (or maybe atol=eps) Note: """ The relative difference (`rtol` * abs(`b`)) and the absolute difference `atol` are added together to compare against the absolute difference between `a` and `b`. """ Which points to seting atol=0 for the default as well, or it can totally mess up a test on very small numbers. I'll bet there is a LOT of sloppy use of these out the wild (I know I've been sloppy), and Im starting to think that atol=0 is the ONLY appropriate default for the sloppy among us for instance: In [40]: a1 = np.array([1e-100]) In [41]: a2 = np.array([1.00000001e-100]) In [42]: np.all np.all np.allclose np.alltrue In [42]: np.allclose(a1, a2, rtol=1e-10) Out[42]: True In [43]: np.allclose(a1, a2, rtol=1e-10, atol=0) Out[43]: False That's really not good. By the way: Definition: np.allclose(a, b, rtol=1e-05, atol=1e-08) Really? those are HUGE defaults for double-precision math. I can't believe I haven't looked more closely at this before! -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
18.07.2014 22:13, Chris Barker kirjoitti: [clip]
but an appropriate rtol would work there too. If only zero testing is needed, then atol=0 makes sense as a default. (or maybe atol=eps)
There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or some big multiple) is also reasonable in the scale-freeness sense.
On Fri, Jul 18, 2014 at 12:43 PM, Pauli Virtanen <pav@iki.fi> wrote:
18.07.2014 22:13, Chris Barker kirjoitti: [clip]
but an appropriate rtol would work there too. If only zero testing is needed, then atol=0 makes sense as a default. (or maybe atol=eps)
There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or some big multiple) is also reasonable in the scale-freeness sense.
right! brain blip -- eps is the difference between 1 and then next larger representable number, yes? So a long way away from smallest representable number. So yes, zero or [something]e-308 -- making zero seem like a good idea again.... is it totally ridiculous to have the default be dependent on dtype? float32 vs float64? -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On Fri, Jul 18, 2014 at 2:32 PM, Chris Barker <chris.barker@noaa.gov> wrote:
On Fri, Jul 18, 2014 at 12:43 PM, Pauli Virtanen <pav@iki.fi> wrote:
18.07.2014 22:13, Chris Barker kirjoitti: [clip]
but an appropriate rtol would work there too. If only zero testing is needed, then atol=0 makes sense as a default. (or maybe atol=eps)
There's plenty of room below eps, but finfo(float).tiny ~ 3e-308 (or some big multiple) is also reasonable in the scale-freeness sense.
right! brain blip -- eps is the difference between 1 and then next larger representable number, yes? So a long way away from smallest representable number. So yes, zero or [something]e-308 -- making zero seem like a good idea again....
is it totally ridiculous to have the default be dependent on dtype? float32 vs float64?
Whatever the final decision is, if the defaults change we should start with a FutureWarning. How we can make that work is uncertain, because I don't know of any reliable way to detect if we are using the default value or if a value was passed in. Maybe just warn if `atol == 0` ? Chuck
Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which desired I use would depend on what I get out of R or Stata.
I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails?
I was thinking of the case when we only see "pvalue < 1e-16" or something like this, and we replace this by assert close to zero. which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` with maybe an additional rtol=1e-11 if we have an array of pvalues where some are "large" (>0.5). It's not a very frequent case, mainly when we don't have access to the underlying float numbers and only have the print representation. Josef
Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which
desired I
use would depend on what I get out of R or Stata.
I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails?
I was thinking of the case when we only see "pvalue < 1e-16" or something
On 18 Jul 2014 19:31, <josef.pktd@gmail.com> wrote: like this, and we replace this by assert close to zero.
which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` with maybe an additional rtol=1e-11 if we have an array of pvalues where some are "large" (>0.5).
This example is also handled correctly by my proposal :-) -n
On Fri, Jul 18, 2014 at 2:44 PM, Nathaniel Smith <njs@pobox.com> wrote:
On 18 Jul 2014 19:31, <josef.pktd@gmail.com> wrote:
Making the behavior of assert_allclose depending on whether desired is exactly zero or 1e-20 looks too difficult to remember, and which
desired I
use would depend on what I get out of R or Stata.
I thought your whole point here was that 1e-20 and zero are qualitatively different values that you would not want to accidentally confuse? Surely R and Stata aren't returning exact zeros for small non-zero values like probability tails?
I was thinking of the case when we only see "pvalue < 1e-16" or something like this, and we replace this by assert close to zero. which would translate to `assert_allclose(pvalue, 0, atol=1e-16)` with maybe an additional rtol=1e-11 if we have an array of pvalues where some are "large" (>0.5).
This example is also handled correctly by my proposal :-)
depends on the details of your proposal alternative: desired is exactly zero means assert_equal (Pdb) self.res_reg.params[m:] array([ 0., 0., 0.]) (Pdb) assert_allclose(0, self.res_reg.params[m:]) (Pdb) assert_allclose(0, self.res_reg.params[m:], rtol=0, atol=0) (Pdb) This test uses currently assert_almost_equal with decimal=4 :( regularized estimation with hard thresholding: the first m values are estimate not equal zero, the m to the end elements are "exactly zero". This is discrete models fit_regularized which predates numpy assert_allclose. I haven't checked what the unit test of Kerby's current additions for fit_regularized looks like. unit testing is serious business: I'd rather have good unit test in SciPy related packages than convincing a few more newbies that they can use the defaults for everything. Josef
-n
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
Indeed, was reverting a change that crept into https://github.com/numpy/numpy/commit/f527b49a
It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
I agree, current behavior quite annoying. It would make sense to change the atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't think that should be done. Ralf
On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu <tsyu80@gmail.com> wrote:
It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
I agree, current behavior quite annoying. It would make sense to change the atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't think that should be done.
Currently we have: allclose: rtol=1e-5, atol=1e-8 assert_allclose: rtol=1e-7, atol=0 Why would it be painful to change assert_allclose to match allclose? It would weaken some tests, but no code would break. -n -- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org
On Thu, Jul 17, 2014 at 11:37 AM, Nathaniel Smith <njs@pobox.com> wrote:
On Wed, Jul 16, 2014 at 7:47 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu <tsyu80@gmail.com> wrote:
It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess
a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
I agree, current behavior quite annoying. It would make sense to change
atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't
this is the think
that should be done.
Currently we have:
allclose: rtol=1e-5, atol=1e-8 assert_allclose: rtol=1e-7, atol=0
Why would it be painful to change assert_allclose to match allclose? It would weaken some tests, but no code would break.
We might break our code, if suddenly our test suite doesn't do what it is supposed to do. (rough guess: 40% of the statsmodels code are unit tests.) Josef
-n
-- Nathaniel J. Smith Postdoctoral researcher - Informatics - University of Edinburgh http://vorpus.org _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Wed, Jul 16, 2014 at 1:47 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Wed, Jul 16, 2014 at 6:37 AM, Tony Yu <tsyu80@gmail.com> wrote:
Is there any reason why the defaults for `allclose` and `assert_allclose` differ? This makes debugging a broken test much more difficult. More importantly, using an absolute tolerance of 0 causes failures for some common cases. For example, if two values are very close to zero, a test will fail:
np.testing.assert_allclose(0, 1e-14)
Git blame suggests the change was made in the following commit, but I guess that change only reverted to the original behavior.
https://github.com/numpy/numpy/commit/f43223479f917e404e724e6a3df27aa701e6d6...
Indeed, was reverting a change that crept into https://github.com/numpy/numpy/commit/f527b49a
It seems like the defaults for `allclose` and `assert_allclose` should match, and an absolute tolerance of 0 is probably not ideal. I guess this is a pretty big behavioral change, but the current default for `assert_allclose` doesn't seem ideal.
I agree, current behavior quite annoying. It would make sense to change the atol default to 1e-8, but technically it's a backwards compatibility break. Would probably have a very minor impact though. Changing the default for rtol in one of the functions may be much more painful though, I don't think that should be done.
Ralf
Thanks for the feedback. I've opened up a PR here: https://github.com/numpy/numpy/pull/4880 Best, -Tony
participants (7)
-
Charles R Harris
-
Chris Barker
-
josef.pktd@gmail.com
-
Nathaniel Smith
-
Pauli Virtanen
-
Ralf Gommers
-
Tony Yu