[Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+

josef.pktd at gmail.com josef.pktd at gmail.com
Mon Jan 22 11:24:30 EST 2018


On Mon, Jan 22, 2018 at 11:13 AM, Allan Haldane <allanhaldane at gmail.com>
wrote:

> On 01/22/2018 10:53 AM, josef.pktd at gmail.com wrote:
>
>>
>> This is similar to the above example
>> a[['a', 'c']].view('i8')
>> but it doesn't try to combine fields.
>>
>> In  many examples where I used structured dtypes a long time ago,
>> switched between consistent views as either a standard array of subsets or
>> as .structured dtypes.
>> For this usecase it wouldn't matter whether a[['a', 'c']] returns a view
>> or copy, as long as we can get the second view that is consistent with the
>> selected part of the memory. This would also be independent of whether
>> numpy pads internally and adjusts the strides if possible or not.
>>
>> np.__version__
>>>>>
>>>> '1.11.2'
>>
>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')])
>>>>> a
>>>>>
>>>> array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0),
>>         (1, 1.0, 1.0)],
>>        dtype=[('a', '<i8'), ('b', '<f8'), ('c', '<f8')])
>>
>> a[['b', 'c']].view(('f8', 2)).mean(0)
>>>>>
>>>> array([ 1.,  1.])
>>
>>> a[['b', 'c']].view(('f8', 2)).dtype
>>>>>
>>>> dtype('float64')
>>
>
> Hmm, this did not raise a FutureWarning in 11.2, so I was not quite right
> in my message. It looks like this particular line only started raising
> FutureWarnings in 1.12.0.
>
> Aside The plan is that statsmodels will drop all usage and support for
>> rec_arays/structured dtypes
>> in the following release (0.10).
>> Then structured dtypes are free (from our perspective) to provide low
>> level struct support
>> instead of pretending to be dataframe_like.
>>
>
> Your use of structured arrays is "pandas-like", ie you are using it
> tabular data manipulation. In numpy 1.13 we updated the structured docs to
> discourage this. Of course users can do what they want, but here is what
> the new docs say:
>
>     Structured arrays are designed for low-level
>     manipulation of structured data, for example, for
>     interpreting binary blobs. Structured datatypes are
>     designed to mimic 'structs' in the C language, making
>     them also useful for interfacing with C code. For these
>     purposes, numpy supports specialized features such as
>     subarrays and nested datatypes, and allows manual
>     control over the memory layout of the structure.
>
>     For simple manipulation of tabular data other pydata
>     projects, such as pandas, xarray, or DataArray, provide
>     higher-level interfaces that may be more suitable. These
>     projects may also give better performance for tabular
>     data analysis because the C-struct-like memory layout of
>     structured arrays can lead to poor cache behavior.
>
>
Once upon a time ....

The test code was written in June 2010
In Oct/Nov 2017 we switched to pandas for loading the data but not for the
reference `results` to avoid numpy recarray warnings.
In Jan 2018 we switched to pandas also for the reference results

statsmodels has a lot of "legacy" code especially in the datasets and unit
tests, when recarrays were still the appropriate precursor to pandas.
recarrays are built on structured dtypes, and were not just supposed to be
low level C-structs.


Josef





>
> Allan
>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20180122/25a532c6/attachment.html>


More information about the NumPy-Discussion mailing list