[Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+

Mon Jan 22 11:13:17 EST 2018

On 01/22/2018 10:53 AM, josef.pktd at gmail.com wrote:
> 
> This is similar to the above example
> a[['a', 'c']].view('i8')
> but it doesn't try to combine fields.
> 
> In  many examples where I used structured dtypes a long time ago, 
> switched between consistent views as either a standard array of subsets 
> or as .structured dtypes.
> For this usecase it wouldn't matter whether a[['a', 'c']] returns a view 
> or copy, as long as we can get the second view that is consistent with 
> the selected part of the memory. This would also be independent of 
> whether numpy pads internally and adjusts the strides if possible or not.
> 
>>>> np.__version__
> '1.11.2'
> 
>>>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')])
>>>> a
> array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0),
>         (1, 1.0, 1.0)],
>        dtype=[('a', '<i8'), ('b', '<f8'), ('c', '<f8')])
> 
>>>> a[['b', 'c']].view(('f8', 2)).mean(0)
> array([ 1.,  1.])
>>>> a[['b', 'c']].view(('f8', 2)).dtype
> dtype('float64')

Hmm, this did not raise a FutureWarning in 11.2, so I was not quite 
right in my message. It looks like this particular line only started 
raising FutureWarnings in 1.12.0.

> Aside The plan is that statsmodels will drop all usage and support for 
> rec_arays/structured dtypes
> in the following release (0.10).
> Then structured dtypes are free (from our perspective) to provide low 
> level struct support
> instead of pretending to be dataframe_like.

Your use of structured arrays is "pandas-like", ie you are using it 
tabular data manipulation. In numpy 1.13 we updated the structured docs 
to discourage this. Of course users can do what they want, but here is 
what the new docs say:

     Structured arrays are designed for low-level
     manipulation of structured data, for example, for
     interpreting binary blobs. Structured datatypes are
     designed to mimic 'structs' in the C language, making
     them also useful for interfacing with C code. For these
     purposes, numpy supports specialized features such as
     subarrays and nested datatypes, and allows manual
     control over the memory layout of the structure.

     For simple manipulation of tabular data other pydata
     projects, such as pandas, xarray, or DataArray, provide
     higher-level interfaces that may be more suitable. These
     projects may also give better performance for tabular
     data analysis because the C-struct-like memory layout of
     structured arrays can lead to poor cache behavior.

Allan