[Numpy-discussion] `allclose` vs `assert_allclose`

Fri Jul 18 14:20:54 EDT 2014

On Fri, Jul 18, 2014 at 2:03 PM, <josef.pktd at gmail.com> wrote:

>
>
>
> On Fri, Jul 18, 2014 at 12:53 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Fri, Jul 18, 2014 at 12:38 PM,  <josef.pktd at gmail.com> wrote:
>> >
>> > On Thu, Jul 17, 2014 at 4:07 PM, <josef.pktd at gmail.com> wrote:
>> >
>> >> If you mean by this to add atol=1e-8 as default, then I'm against it.
>> >>
>> >> At least it will change the meaning of many of our tests in
>> statsmodels.
>> >>
>> >> I'm using rtol to check for correct 1e-15 or 1e-30, which would be
>> >> completely swamped if you change the default atol=0.
>> >> Adding atol=0 to all assert_allclose that currently use only rtol is a
>> lot
>> >> of work.
>> >> I think I almost never use a default rtol, but I often leave atol at
>> the
>> >> default = 0.
>> >>
>> >> If we have zeros, then I don't think it's too much work to decide
>> whether
>> >> this should be atol=1e-20, or 1e-8.
>> >
>> >
>> > copied from
>> > http://mail.scipy.org/pipermail/numpy-discussion/2014-July/070639.html
>> > since I didn't get any messages here
>> >
>> > This is a compelling use-case, but there are also lots of compelling
>> > usecases that want some non-zero atol (i.e., comparing stuff to 0).
>> > Saying that allclose is for one of those use cases and assert_allclose
>> > is for the other is... not a very felicitious API design, I think. So
>> > we really should do *something*.
>> >
>> > Are there really any cases where you want non-zero atol= that don't
>> > involve comparing something against a 'desired' value of zero? It's a
>> > little wacky, but I'm wondering if we ought to change the rule (for
>> > all versions of allclose) to
>> >
>> > if desired == 0:
>> >     tol = atol
>> > else:
>> >     tol = rtol * desired
>> >
>> > In particular, means that np.allclose(x, 1e-30) would reject x values
>> > of 0 or 2e-30, but np.allclose(x, 0) will accept x == 1e-30 or 2e-30.
>> >
>> > -n
>> >
>> >
>> > That's much too confusing.
>> > I don't know what the usecases for np.allclose are since I don't have
>> any.
>>
>> I wrote allclose because it's shorter, but my point is that
>> assert_allclose and allclose should use the same criterion, and was
>> making a suggestion for what that shared criterion might be.
>>
>> > assert_allclose is one of our (statsmodels) most frequently used numpy
>> > function
>> >
>> > this is not informative:
>> >
>> > `np.allclose(x, 1e-30)`
>> >
>> >
>> > since there are keywords
>> > either np.assert_allclose(x, atol=1e-30)
>>
>> I think we might be talking past each other here -- 1e-30 here is my
>> "gold" p-value that I'm hoping x will match, not a tolerance argument.
>>
>
> my mistake
>
>
>
>>
>> > if I want to be "close" to zero
>> > or
>> >
>> > np.assert_allclose(x, rtol=1e-11, atol=1e-25)
>> >
>> > if we have a mix of large numbers and "zeros" in an array.
>> >
>> > Making the behavior of assert_allclose depending on whether desired is
>> > exactly zero or 1e-20 looks too difficult to remember, and which
>> desired I
>> > use would depend on what I get out of R or Stata.
>>
>> I thought your whole point here was that 1e-20 and zero are
>> qualitatively different values that you would not want to accidentally
>> confuse? Surely R and Stata aren't returning exact zeros for small
>> non-zero values like probability tails?
>>
>> > atol=1e-8 is not close to zero in most cases in my experience.
>>
>> If I understand correctly (Tony?) the problem here is that another
>> common use case for assert_allclose is in cases like
>>
>> assert_allclose(np.sin(some * complex ** calculation / (that - should
>> - be * zero)), 0)
>>
>> For cases like this, you need *some* non-zero atol or the thing just
>> doesn't work, and one could quibble over the exact value as long as
>> it's larger than "normal" floating point error. These calculations
>> usually involve "normal" sized numbers, so atol should be comparable
>> to eps * these values.  eps is 2e-16, so atol=1e-8 works for values up
>> to around 1e8, which is a plausible upper bound for where people might
>> expect assert_allclose to just work. I'm trying to figure out some way
>> to support your use cases while also supporting other use cases.
>>
>
> my problem is that there is no "normal" floating point error.
> If I have units in 1000 or units in 0.0001 depends on the example and
> dataset that we use for testing.
>
> this test two different functions/methods that calculate the same thing
>
> (Pdb) pval
> array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
> (Pdb) res2.pvalues
> array([  3.01270184e-42,   5.90847367e-02,   3.00066946e-12])
> (Pdb) assert_allclose(pval, res2.pvalues, rtol=5 * rtol, atol=1e-25)
>
> I don't care about errors that are smaller that 1e-25
>
> for example testing p-values against Stata
>
> (Pdb) tt.pvalue
> array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
> (Pdb) res2.pvalues
> array([  5.70315140e-30,   6.24662551e-02,   5.86024090e-11])
> (Pdb) tt.pvalue - res2.pvalues
> array([  2.16612016e-40,   2.51187959e-15,   4.30027936e-21])
> (Pdb) tt.pvalue / res2.pvalues - 1
> array([  3.79811738e-11,   4.01900735e-14,   7.33806349e-11])
> (Pdb) rtol
> 1e-10
> (Pdb) assert_allclose(tt.pvalue, res2.pvalues, rtol=5 * rtol)
>
>
> I could find a lot more and maybe nicer examples, since I spend quite a
> bit of time fine tuning unit tests.
>
> Of course you can change it.
>
> But the testing functions are code and very popular code.
>
> And if you break backwards compatibility, then I wouldn't mind reviewing a
> pull request for statsmodels that adds 300 to 400 `atol=0` to the unit
> tests. :)
>

scipy (not current master) doesn't look "so" bad. I find 400
"assert_allclose(" and maybe a third to half use atol.
As expected optimize uses only atol because of the convergence criteria.
scipy.stats uses mostly rtol or default.

Josef

>
> Josef
>
>
>>
>> -n
>>
>> --
>> Nathaniel J. Smith
>> Postdoctoral researcher - Informatics - University of Edinburgh
>> http://vorpus.org
>> _______________________________________________
>> NumPy-Discussion mailing list
>> NumPy-Discussion at scipy.org
>> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140718/6bb769fb/attachment.html>