[Numpy-discussion] Multiple-field indexing: view vs copy in 1.14+

Allan Haldane allanhaldane at gmail.com
Sat Jan 27 23:48:34 EST 2018


On 01/25/2018 03:56 PM, josef.pktd at gmail.com wrote:
> 
> 
> On Thu, Jan 25, 2018 at 1:49 PM, Marten van Kerkwijk 
> <m.h.vankerkwijk at gmail.com <mailto:m.h.vankerkwijk at gmail.com>> wrote:
> 
>     On Thu, Jan 25, 2018 at 1:16 PM, Stefan van der Walt
>     <stefanv at berkeley.edu <mailto:stefanv at berkeley.edu>> wrote:
>     > On Mon, 22 Jan 2018 10:11:08 -0500, Marten van Kerkwijk wrote:
>     >>
>     >> I think on the consistency argument is perhaps the most important:
>     >> views are very powerful and in many ways one *counts* on them
>     >> happening, especially in working with large arrays.
>     >
>     >
>     > I had the same gut feeling, but the fancy indexing example made me
>     > pause:
>     >
>     > In [9]: x = np.arange(12, dtype=float).reshape((3, 4))
>     >
>     > In [10]: p = x[[0, 1]]  # copy of data
>     >
>     > Then:
>     >
>     > In [11]: x = np.array([(0, 1), (2, 3)], dtype=[('a', int), ('b', int)])
>     >
>     > In [12]: p = x[['a', 'b']]  # copy of data, but proposal will change that
> 
> 
> What does this do?
> p = x[['a', 'b']].copy()

In 1.14.0 this creates an exact copy of what was returned by
`x[['a', 'b']]`, including any padding bytes.

> My impression is that the problems with the view are because the padded 
> view doesn't behave like a "standard" dtype or array, i.e. the follow-up 
> behavior is the problematic part.

I think the padded view is a "standard array" in the sense that you can 
easily create structured arrays with padding bytes, for example by using 
the `align=True` options.

     >>> np.zeros(3, dtype=np.dtype('u1,f4', align=True))
     array([(0, 0.), (0, 0.), (0, 0.)],
       dtype={'names':['f0','f1'], 'formats':['u1','<f4'], 
'offsets':[0,4], 'itemsize':8, 'aligned':True})

Compare to

     >>> np.zeros(3, dtype='u1,u1,u1,u1,f4')[['f0', 'f4']]
     array([(0, 0.), (0, 0.), (0, 0.)],
       dtype={'names':['f0','f4'], 'formats':['u1','<f4'], 
'offsets':[0,4], 'itemsize':8})


There are still bugs in numpy that occur for arrays with padding.

Allan

> Josef
> 
>     >
>     > We're not doing the same kind of indexing here exactly (in one case we
>     > grab elements, in the other parts of elements), but the view behavior
>     > may still break the "mental expectation".
> 
>     A bit off-topic, but maybe this is another argument to just allow
>     `x['a', 'b']` -- I never understood why a tuple was not the
>     appropriate iterable for getting multiple items from a record.
> 
>     -- Marten
>     _______________________________________________
>     NumPy-Discussion mailing list
>     NumPy-Discussion at python.org <mailto:NumPy-Discussion at python.org>
>     https://mail.python.org/mailman/listinfo/numpy-discussion
>     <https://mail.python.org/mailman/listinfo/numpy-discussion>
> 
> 
> 
> 
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
> 



More information about the NumPy-Discussion mailing list