On Mon, Apr 11, 2011 at 12:37 PM, Robert Kern <robert.kern@gmail.com> wrote:
On Mon, Apr 11, 2011 at 13:54, Skipper Seabold <jsseabold@gmail.com> wrote:
> All,
> We noticed some failing tests for statsmodels between numpy 1.5.1 and
> numpy >= 1.6.0. These are the versions where I noticed the change. It
> seems that when you divide a float array and multiply by a boolean
> array the answers are different (unless the others are also off by
> some floating point). Can anyone replicate the following using this
> script or point out what I'm missing?
> import numpy as np
> print np.__version__
> np.random.seed(12345)
> test = np.random.randint(0,2,size=10).astype(bool)
> t = 1.345
> arr = np.random.random(size=10)
> arr # okay
> t/arr # okay
> (1-test)*arr # okay
> (1-test)*t/arr # huh?

|12> 1-test
array([1, 0, 0, 0, 1, 0, 1, 1, 0, 1], dtype=int8)

|13> (1-test)*t
array([ 1.34472656,  0.        ,  0.        ,  0.        ,
1.34472656,  0.        ,
       1.34472656,  1.34472656,  0.        ,  1.34472656], dtype=float16)

Some of the recent ufunc changes or the introduction of the float16
type must have changed the way the dtype is chosen for the
"int8-array*float64-scalar" case. Previously, the result was a float64
array, as expected.

Mark, I expect this is a result of one of your changes. Can you take a
look at this?

It's the type promotion rules change that is causing this, and it's definitely a 1.6.0 release blocker. Good catch!

I can think of an easy, reasonable way to fix it in the result_type function, but since the ufuncs themselves use a poor method of resolving the types, it will take some thought to apply the same idea there. Maybe detecting that the ufunc is a binary op at its creation time, setting a flag, and using the result_type function in that case. This would have the added bonus of being faster, too.

Going back to the 1.5 code isn't an option, since adding the new data types while maintaining ABI compatibility required major surgery to this part of the system.