[Numpy-discussion] numpy 1.10.1 reduce operation on recarrays

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Oct 19 21:56:59 EDT 2015


On Fri, Oct 16, 2015 at 9:31 PM, Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 10/16/2015 09:17 PM, josef.pktd at gmail.com wrote:
>
>>
>>
>> On Fri, Oct 16, 2015 at 8:56 PM, Allan Haldane <allanhaldane at gmail.com
>> <mailto:allanhaldane at gmail.com>> wrote:
>>
>>     On 10/16/2015 05:31 PM, josef.pktd at gmail.com
>>     <mailto:josef.pktd at gmail.com> wrote:
>>     >
>>     >
>>     > On Fri, Oct 16, 2015 at 2:21 PM, Charles R Harris
>>     > <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
>>     <mailto:charlesr.harris at gmail.com
>>     <mailto:charlesr.harris at gmail.com>>> wrote:
>>     >
>>     >
>>     >
>>     >     On Fri, Oct 16, 2015 at 12:20 PM, Charles R Harris
>>     >     <charlesr.harris at gmail.com <mailto:charlesr.harris at gmail.com>
>>     <mailto:charlesr.harris at gmail.com
>>     <mailto:charlesr.harris at gmail.com>>> wrote:
>>     >
>>     >
>>     >
>>     >         On Fri, Oct 16, 2015 at 11:58 AM, <josef.pktd at gmail.com
>> <mailto:josef.pktd at gmail.com>
>>      >         <mailto:josef.pktd at gmail.com
>>
>>     <mailto:josef.pktd at gmail.com>>> wrote:
>>      >
>>      >             was there a change with reduce operations with
>>     recarrays in
>>      >             1.10 or 1.10.1?
>>      >
>>      >             Travis shows a new test failure in the statsmodels
>>     testsuite
>>      >             with 1.10.1:
>>      >
>>      >             ERROR: test suite for <class
>>      >             'statsmodels.base.tests.test_data.TestRecarrays'>
>>      >
>>      >               File
>>      >
>>
>> "/home/travis/miniconda/envs/statsmodels-test/lib/python2.7/site-packages/statsmodels-0.8.0-py2.7-linux-x86_64.egg/statsmodels/base/data.py",
>>      >             line 131, in _handle_constant
>>      >                 const_idx = np.where(self.exog.ptp(axis=0) ==
>>      >             0)[0].squeeze()
>>      >             TypeError: cannot perform reduce with flexible type
>>      >
>>      >
>>      >             Sorry for asking so late.
>>      >             (statsmodels is short on maintainers, and I'm
>> distracted)
>>      >
>>      >
>>      >             statsmodels still has code to support recarrays and
>>      >             structured dtypes from the time before pandas became
>>      >             popular, but I don't think anyone is using them
>> together
>>      >             with statsmodels anymore.
>>      >
>>      >
>>      >         There were several commits dealing both recarrays and
>>     ufuncs, so
>>      >         this might well be a regression.
>>      >
>>      >
>>      >     A bisection would be helpful. Also, open an issue.
>>      >
>>      >
>>      >
>>      > The reason for the test failure might be somewhere else hiding
>> behind
>>      > several layers of statsmodels, but only started to show up with
>>     numpy 1.10.1
>>      >
>>      > I already have the reduce exception with my currently installed
>> numpy
>>      > '1.9.2rc1'
>>      >
>>      >>>> x = np.random.random(9*3).view([('const', 'f8'),('x_1', 'f8'),
>>      > ('x_2', 'f8')]).view(np.recarray)
>>      >
>>      >>>> np.ptp(x, axis=0)
>>      > Traceback (most recent call last):
>>      >   File "<stdin>", line 1, in <module>
>>      >   File
>>      >
>>
>> "C:\programs\WinPython-64bit-3.4.3.1\python-3.4.3.amd64\lib\site-packages\numpy\core\fromnumeric.py",
>>      > line 2047, in ptp
>>      >     return ptp(axis, out)
>>      > TypeError: cannot perform reduce with flexible type
>>      >
>>      >
>>      > Sounds like fun, and I don't even know how to automatically bisect.
>>      >
>>      > Josef
>>
>>     That example isn't the problem (ptp should definitely fail on
>> structured
>>     arrays), but I've tracked down what is - it has to do with views of
>>     record arrays.
>>
>>     The fix looks simple, I'll get it in for the next release.
>>
>>
>> Thanks,
>>
>> I realized that at that point in the statsmodels code we should have
>> only regular ndarrays, so the array conversion fails somewhere.
>>
>> AFAICS, the main helper function to convert is
>>
>> def struct_to_ndarray(arr):
>>      return arr.view((float, len(arr.dtype.names)))
>>
>> which doesn't look like it will handle other dtypes than float64. Nobody
>> ever complained, so maybe our test suite is the only user of this.
>>
>> What is now the recommended way of converting structured
>> dtypes/recarrays to ndarrays?
>>
>> Josef
>>
>
> Yes, that's the code I narrowed it down to as well. I think the code in
> statsmodels is fine, the problem is actually a  bug I must admit I
> introduced in changes to the way views of recarrays work.
>
> If you are curious, the bug is in this line:
>
> https://github.com/numpy/numpy/blob/master/numpy/core/records.py#L467
>
> This line was intended to fix the problem that accessing a nested record
> array field would lose the 'np.record' dtype. I only considered void
> structured arrays, and had forgotten about sub-arrays which statsmodels
> uses.
>
> I think the fix is to replace `issubclass(val.type, nt.void)` with
> `val.names` or something similar. I'll take a closer look soon.
>
>
Another example fresh from Travis that might have the same source

and I didn't even know statsmodels uses recarrays in the models

AssertionError:
Arrays are not almost equal to 7 decimals
(shapes (6,), (6, 3) mismatch)
 x: recarray([�?, �;�:B�ѿ](�D����������,
       ��L��������
ƿC�3Y�?, O�����N;�j���8���H��,
       �N�A�������T��B;��pٿ, 9m�;_���J��...
 y: array([[ 1.       ,  0.       ,  0.       ],
       [-0.2794347, -0.100468 , -1.9709737],
       [-0.0469873, -0.1728197,  0.0436493],...


Josef


>
> Allan
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20151019/0db5fc67/attachment.html>


More information about the NumPy-Discussion mailing list