On 1 April 2010 13:38, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Thu, Apr 1, 2010 at 8:37 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Thu, Apr 1, 2010 at 12:46 AM, Anne Archibald <peridot.faceted@gmail.com> wrote:

On 1 April 2010 02:24, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Thu, Apr 1, 2010 at 12:04 AM, Anne Archibald <peridot.faceted@gmail.com> wrote:

On 1 April 2010 01:59, Charles R Harris <charlesr.harris@gmail.com> wrote:

On Wed, Mar 31, 2010 at 11:46 PM, Anne Archibald <peridot.faceted@gmail.com> wrote: > > On 1 April 2010 01:40, Charles R Harris <charlesr.harris@gmail.com> > wrote: > > > > > > On Wed, Mar 31, 2010 at 11:25 PM, <josef.pktd@gmail.com> wrote: > >> > >> On Thu, Apr 1, 2010 at 1:22 AM, <josef.pktd@gmail.com> wrote: > >> > On Thu, Apr 1, 2010 at 1:17 AM, Charles R Harris > >> > <charlesr.harris@gmail.com> wrote: > >> >> > >> >> > >> >> On Wed, Mar 31, 2010 at 6:08 PM, <josef.pktd@gmail.com> > >> >> wrote: > >> >>> > >> >>> On Wed, Mar 31, 2010 at 7:37 PM, Warren Weckesser > >> >>> <warren.weckesser@enthought.com> wrote: > >> >>> > T J wrote: > >> >>> >> On Wed, Mar 31, 2010 at 1:21 PM, Charles R Harris > >> >>> >> <charlesr.harris@gmail.com> wrote: > >> >>> >> > >> >>> >>> Looks like roundoff error. > >> >>> >>> > >> >>> >>> > >> >>> >> > >> >>> >> So this is "expected" behavior? > >> >>> >> > >> >>> >> In [1]: np.logaddexp2(-1.5849625007211563, > >> >>> >> -53.584962500721154) > >> >>> >> Out[1]: -1.5849625007211561 > >> >>> >> > >> >>> >> In [2]: np.logaddexp2(-0.5849625007211563, > >> >>> >> -53.584962500721154) > >> >>> >> Out[2]: nan > >> >>> >> > >> >>> > > >> >>> > Is any able to reproduce this? I don't get 'nan' in > >> >>> > either > >> >>> > 1.4.0 > >> >>> > or > >> >>> > 2.0.0.dev8313 (32 bit Mac OSX). In an earlier email T J > >> >>> > reported > >> >>> > using > >> >>> > 1.5.0.dev8106. > >> >>> > >> >>> > >> >>> > >> >>> >>> np.logaddexp2(-0.5849625007211563, -53.584962500721154) > >> >>> nan > >> >>> >>> np.logaddexp2(-1.5849625007211563, -53.584962500721154) > >> >>> -1.5849625007211561 > >> >>> > >> >>> >>> np.version.version > >> >>> '1.4.0' > >> >>> > >> >>> WindowsXP 32 > >> >>> > >> >> > >> >> What compiler? Mingw? > >> > > >> > yes, mingw 3.4.5. , official binaries release 1.4.0 by David > >> > >> sse2 Pentium M > >> > > > > Can you try the exp2/log2 functions with the problem data and see > > if > > something goes wrong? > > Works fine for me. > > If it helps clarify things, the difference between the two problem > input values is exactly 53 (and that's what logaddexp2 does an exp2 > of); so I can provide a simpler example: > > In [23]: np.logaddexp2(0, -53) > Out[23]: nan > > Of course, for me it fails in both orders. >

Ah, that's progress then ;) The effective number of bits in a double is 53 (52 + implicit bit). That shouldn't cause problems but it sure looks suspicious.

Indeed, that's what led me to the totally wrong suspicion that denormals have something to do with the problem. More data points:

In [38]: np.logaddexp2(63.999, 0) Out[38]: nan

In [39]: np.logaddexp2(64, 0) Out[39]: 64.0

In [42]: np.logaddexp2(52.999, 0) Out[42]: 52.999000000000002

In [43]: np.logaddexp2(53, 0) Out[43]: nan

It looks to me like perhaps the NaNs are appearing when the smaller term affects only the "extra" bits provided by the FPU's internal larger-than-double representation. Some such issue would explain why the problem seems to be hardware- and compiler-dependent.

Hmm, that 63.999 is kinda strange. Here is a bit of code that might confuse a compiler working with different size mantissas:

@type@ npy_log2_1p@c@(@type@ x) { @type@ u = 1 + x; if (u == 1) { return LOG2E*x; } else { return npy_log2@c@(u) * x / (u - 1); } }

It might be that u != 1 does not imply u-1 != 0.

That does indeed look highly suspicious. I'm not entirely sure how to work around it. GSL uses a volatile declaration:

http://www.google.ca/codesearch/p?hl=en#p9nGS4eQGUI/gnu/gsl/gsl-1.8.tar.gz%7C8VCQSLJ5jR8/gsl-1.8/sys/log1p.c&q=log1p On the other hand boost declares itself defeated by optimizing compilers and uses a Taylor series:

http://www.google.ca/codesearch/p?hl=en#sdP2GRSfgKo/dcplusplus/trunk/boost/boost/math/special_functions/log1p.hpp&q=log1p&sa=N&cd=7&ct=rc While R makes no mention of the corrected formula or optimizing compilers but takes the same approach, only with Chebyshev series:

Since, at least on my machine, ordinary log1p appears to work fine, is there any reason not to have log2_1p call it and scale the result? Or does the compiler make a hash of our log1p too?

Calling log1p and scaling looks like the right thing to do here. And our log1p needs improvement.

Tinkering a bit, I think we should implement the auxiliary function f(p) = log((1+p)/(1 - p)), which is antisymmetric and has the expansion 2p*(1 + p^2/3 + p^4/5 + ...). The series in the parens is increasing, so it is easy to terminate. Note that for p = +/- 1 it goes over to the harmonic series, so convergence is slow near the ends, but they can be handled using normal logs. Given 1 + x = (1+p)/(1-p) and solving for p gives p = x/(2 + x), so when x ranges from -1/2 -> 1/2, p ranges from -1/3 -> 1/5, hence achieving double precision should involve no more than about 17 terms. I think this is better than the expansion in R.

First I guess we should check which systems don't have log1p; if glibc has it as an intrinsic, that should cover Linux (though I suppose we should check its quality). Do Windows and Mac have log1p? For testing I suppose we should make our own implementation somehow available even on systems where it's unnecessary. Power series are certainly easy, but would some of the other available tricks - Chebyshev series or rational function approximations - be better? I notice R uses Chebyshev series, although maybe that's just because they have a good evaluator handy. Anne

Chuck

_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion