[Numpy-discussion] Behaviour of copy for structured dtypes with gaps

Thu Apr 11 09:51:10 EDT 2019

Hi All,

An issue [1] about the copying of arrays with structured dtype raised a
question about what the expected behaviour is: does copy always preserve
the dtype as is, or should it remove padding?

Specifically, consider an array with a structure with many fields, say 'a'
to 'z'. Since numpy 1.16, if one does a[['a', 'z']]`, a view will be
returned. In this case, its dtype will include a large offset. Now, if we
copy this view, should the result have exactly the same dtype, including
the large offset (i.e., the copy takes as much memory as the original full
array), or should the padding be removed? From the discussion so far, it
seems the logic has boiled down to a choice between:

(1) Copy is a contract that the dtype will not vary (e.g., we also do not
change endianness);

(2) Copy is a contract that any access to the data in the array will return
exactly the same result, without wasting memory and possibly optimized for
access with different strides. E.g., `array[::10].copy() also compacts the
result.

An argument in favour of (2) is that, before numpy 1.16, `a[['a',
'z']].copy()` did return an array without padding. Of course, this relied
on `a[['a', 'z']]` already returning a copy without padding, but still this
is a regression.

More generally, there should at least be a clear way to get the compact
copy. Also, it would make sense for things like `np.save` to remove any
padding (it currently does not).

What do people think? All the best,

Marten

[1] https://github.com/numpy/numpy/issues/13299
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20190411/ac22a134/attachment.html>