[Numpy-discussion] Odd numerical difference between Numpy 1.5.1 and Numpy > 1.5.1

Mon Apr 11 23:48:49 EDT 2011

On Apr 11, 2011, at 3:55 PM, Charles R Harris wrote:

> 
> 
> On Mon, Apr 11, 2011 at 2:31 PM, Mark Wiebe <mwwiebe at gmail.com> wrote:
> On Mon, Apr 11, 2011 at 12:37 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Apr 11, 2011 at 13:54, Skipper Seabold <jsseabold at gmail.com> wrote:
> > All,
> >
> > We noticed some failing tests for statsmodels between numpy 1.5.1 and
> > numpy >= 1.6.0. These are the versions where I noticed the change. It
> > seems that when you divide a float array and multiply by a boolean
> > array the answers are different (unless the others are also off by
> > some floating point). Can anyone replicate the following using this
> > script or point out what I'm missing?
> >
> > import numpy as np
> > print np.__version__
> > np.random.seed(12345)
> >
> > test = np.random.randint(0,2,size=10).astype(bool)
> > t = 1.345
> > arr = np.random.random(size=10)
> >
> > arr # okay
> > t/arr # okay
> > (1-test)*arr # okay
> > (1-test)*t/arr # huh?
> 
> [~]
> |12> 1-test
> array([1, 0, 0, 0, 1, 0, 1, 1, 0, 1], dtype=int8)
> 
> [~]
> |13> (1-test)*t
> array([ 1.34472656,  0.        ,  0.        ,  0.        ,
> 1.34472656,  0.        ,
>        1.34472656,  1.34472656,  0.        ,  1.34472656], dtype=float16)
> 
> Some of the recent ufunc changes or the introduction of the float16
> type must have changed the way the dtype is chosen for the
> "int8-array*float64-scalar" case. Previously, the result was a float64
> array, as expected.
> 
> Mark, I expect this is a result of one of your changes. Can you take a
> look at this?
> 
> It's the type promotion rules change that is causing this, and it's definitely a 1.6.0 release blocker. Good catch!
> 
> I can think of an easy, reasonable way to fix it in the result_type function, but since the ufuncs themselves use a poor method of resolving the types, it will take some thought to apply the same idea there. Maybe detecting that the ufunc is a binary op at its creation time, setting a flag, and using the result_type function in that case. This would have the added bonus of being faster, too.
> 
> Going back to the 1.5 code isn't an option, since adding the new data types while maintaining ABI compatibility required major surgery to this part of the system.
> 
> 
> There isn't a big rush here, so take the time work it through.

I agree with Charles.   Let's take the needed time and work this through.   This is the sort of thing I was a bit nervous about with the changes made to the casting rules.    Right now, I'm not confident that the scalar-array casting rules are being handled correctly. 

From your explanation, I don't understand how you intend to solve the problem.   Surely there is a better approach than special casing the binary op and this work-around flag (and why is the result_type even part of the discussion)? 

It would be good to see a simple test case and understand why the boolean multiplied by the scalar double is becoming a float16.     In other words,  why does

(1-test)*t 

return a float16 array

This does not sound right at all and it would be good to understand why this occurs, now.   How are you handling scalars multiplied by arrays in general?  

-Travis

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20110411/a3a7fda5/attachment.html>