[Numpy-discussion] Behaviour of copy for structured dtypes with gaps

Stefan van der Walt stefanv at berkeley.edu
Thu Apr 11 18:59:54 EDT 2019


Hi Marten,

On Thu, 11 Apr 2019 09:51:10 -0400, Marten van Kerkwijk wrote:
> From the discussion so far, it
> seems the logic has boiled down to a choice between:
> 
> (1) Copy is a contract that the dtype will not vary (e.g., we also do not
> change endianness);
> 
> (2) Copy is a contract that any access to the data in the array will return
> exactly the same result, without wasting memory and possibly optimized for
> access with different strides. E.g., `array[::10].copy() also compacts the
> result.

I think you'll get different answers, depending on whom you ask—those
interested in low-level memory layout, vs those who use the higher-level
API.  Given that higher-level API use is much more common, I would lean
in the direction of option (2).

>From that perspective, we already don't make consistency guarantees about memory
layout and other flags.  E.g.,

In [16]: x = np.arange(12).reshape((3, 4))                                                                                                                                           
In [17]: x.strides                                                                                                                                                                  
Out[17]: (32, 8)

In [18]: x[::2, 1::2].strides                                                                                                                                                       Out[18]: (64, 16)

In [19]: np.copy(x[::2, 1::2]).strides                                                                                                                                              
Out[19]: (16, 8)

Not to mention this odd copy contract:

>>> x = np.array([[1,2,3],[4,5,6]], order='F')
>>> print(np.copy(x).flags['C_CONTIGUOUS'])
>>> print(x.copy().flags['C_CONTIGUOUS'])

False
True


The objection about arrays that don't behave identically in [0] feels
somewhat arbitary to me.  As shown above, you can always find attributes
that differ between a copied array and the original.

The user's expectation is that they'll get an array that behaves the
same way as the original, not one that is byte-for-byte compatible.  The
most common use case is to make sure that the original array doesn't get
overwritten.

Just to play devil's advocate with myself: if you do choose option (2),
how would you go about making an identical memory copy of the original array?

Best regards,
Stéfan


[0] https://github.com/numpy/numpy/issues/13299#issuecomment-481912827


More information about the NumPy-Discussion mailing list