[Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+
allanhaldane at gmail.com
Mon Jan 22 11:13:17 EST 2018
On 01/22/2018 10:53 AM, josef.pktd at gmail.com wrote:
> This is similar to the above example
> a[['a', 'c']].view('i8')
> but it doesn't try to combine fields.
> In many examples where I used structured dtypes a long time ago,
> switched between consistent views as either a standard array of subsets
> or as .structured dtypes.
> For this usecase it wouldn't matter whether a[['a', 'c']] returns a view
> or copy, as long as we can get the second view that is consistent with
> the selected part of the memory. This would also be independent of
> whether numpy pads internally and adjusts the strides if possible or not.
>>>> a = np.ones(5, dtype=[('a', 'i8'), ('b', 'f8'), ('c', 'f8')])
> array([(1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0), (1, 1.0, 1.0),
> (1, 1.0, 1.0)],
> dtype=[('a', '<i8'), ('b', '<f8'), ('c', '<f8')])
>>>> a[['b', 'c']].view(('f8', 2)).mean(0)
> array([ 1., 1.])
>>>> a[['b', 'c']].view(('f8', 2)).dtype
Hmm, this did not raise a FutureWarning in 11.2, so I was not quite
right in my message. It looks like this particular line only started
raising FutureWarnings in 1.12.0.
> Aside The plan is that statsmodels will drop all usage and support for
> rec_arays/structured dtypes
> in the following release (0.10).
> Then structured dtypes are free (from our perspective) to provide low
> level struct support
> instead of pretending to be dataframe_like.
Your use of structured arrays is "pandas-like", ie you are using it
tabular data manipulation. In numpy 1.13 we updated the structured docs
to discourage this. Of course users can do what they want, but here is
what the new docs say:
Structured arrays are designed for low-level
manipulation of structured data, for example, for
interpreting binary blobs. Structured datatypes are
designed to mimic 'structs' in the C language, making
them also useful for interfacing with C code. For these
purposes, numpy supports specialized features such as
subarrays and nested datatypes, and allows manual
control over the memory layout of the structure.
For simple manipulation of tabular data other pydata
projects, such as pandas, xarray, or DataArray, provide
higher-level interfaces that may be more suitable. These
projects may also give better performance for tabular
data analysis because the C-struct-like memory layout of
structured arrays can lead to poor cache behavior.
More information about the NumPy-Discussion