[Numpy-discussion] numpy 1.10.1 reduce operation on recarrays
Allan Haldane
allanhaldane at gmail.com
Fri Oct 16 21:31:11 EDT 2015
On 10/16/2015 09:17 PM, josef.pktd at gmail.com wrote:
>
>
> On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane <allanhaldane at gmail.com
> <mailto:allanhaldane at gmail.com>> wrote:
>
> On 10/16/2015 05:31 PM, josef.pktd at gmail.com
> <mailto:josef.pktd at gmail.com> wrote:
> >
> >
> > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
> > <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
> <mailto:charlesr.harris at gmail.com
> <mailto:charlesr.harris at gmail.com>>> wrote:
> >
> >
> >
> > On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
> > <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
> <mailto:charlesr.harris at gmail.com
> <mailto:charlesr.harris at gmail.com>>> wrote:
> >
> >
> >
> > On Fri, Oct 16, 2015 at 11:58 AM, <josef.pktd at gmail.com <mailto:josef.pktd at gmail.com>
> > <mailto:josef.pktd at gmail.com
> <mailto:josef.pktd at gmail.com>>> wrote:
> >
> > was there a change with reduce operations with
> recarrays in
> > 1.10 or 1.10.1?
> >
> > Travis shows a new test failure in the statsmodels
> testsuite
> > with 1.10.1:
> >
> > ERROR: test suite for <class
> > 'statsmodels.base.tests.test_data.TestRecarrays'>
> >
> > File
> >
> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
> > line 131, in _handle_constant
> > const_idx = np.where(self.exog.ptp(axis=0) ==
> > 0)[0].squeeze()
> > TypeError: cannot perform reduce with flexible type
> >
> >
> > Sorry for asking so late.
> > (statsmodels is short on maintainers, and I'm distracted)
> >
> >
> > statsmodels still has code to support recarrays and
> > structured dtypes from the time before pandas became
> > popular, but I don't think anyone is using them together
> > with statsmodels anymore.
> >
> >
> > There were several commits dealing both recarrays and
> ufuncs, so
> > this might well be a regression.
> >
> >
> > A bisection would be helpful. Also, open an issue.
> >
> >
> >
> > The reason for the test failure might be somewhere else hiding behind
> > several layers of statsmodels, but only started to show up with
> numpy 1.10.1
> >
> > I already have the reduce exception with my currently installed numpy
> > '1.9.2rc1'
> >
> >>>> x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
> > ('x_2', 'f8')]).view(np.recarray)
> >
> >>>> np.ptp(x, axis=0)
> > Traceback (most recent call last):
> > File "<stdin>", line 1, in <module>
> > File
> >
> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
> > line 2047, in ptp
> > return ptp(axis, out)
> > TypeError: cannot perform reduce with flexible type
> >
> >
> > Sounds like fun, and I don't even know how to automatically bisect.
> >
> > Josef
>
> That example isn't the problem (ptp should definitely fail on structured
> arrays), but I've tracked down what is - it has to do with views of
> record arrays.
>
> The fix looks simple, I'll get it in for the next release.
>
>
> Thanks,
>
> I realized that at that point in the statsmodels code we should have
> only regular ndarrays, so the array conversion fails somewhere.
>
> AFAICS, the main helper function to convert is
>
> def struct_to_ndarray(arr):
> return arr.view((float, len(arr.dtype.names)))
>
> which doesn't look like it will handle other dtypes than float64. Nobody
> ever complained, so maybe our test suite is the only user of this.
>
> What is now the recommended way of converting structured
> dtypes/recarrays to ndarrays?
>
> Josef
Yes, that's the code I narrowed it down to as well. I think the code in
statsmodels is fine, the problem is actually a bug I must admit I
introduced in changes to the way views of recarrays work.
If you are curious, the bug is in this line:
https://github.com/numpy/numpy/blob/master/numpy/core/records.py#L467
This line was intended to fix the problem that accessing a nested record
array field would lose the 'np.record' dtype. I only considered void
structured arrays, and had forgotten about sub-arrays which statsmodels
uses.
I think the fix is to replace `issubclass(val.type, nt.void)` with
`val.names` or something similar. I'll take a closer look soon.
Allan
More information about the NumPy-Discussion
mailing list