On Wed, Feb 25, 2015 at 1:56 PM, Stephan Hoyer <shoyer@gmail.com> wrote:


On Wed, Feb 25, 2015 at 1:24 PM, Jaime Fernández del Río <jaime.frio@gmail.com> wrote:
1. When converting these objects to arrays using PyArray_Converter, if the arrays returned by any of the array interfaces is not C contiguous, aligned, and writeable, a copy that is will be made. Proper arrays and subclasses are passed unchanged. This is the source of the error reported above.
 
 
When converting these objects to arrays using PyArray_Converter, if the arrays returned by any of the array interfaces is not C contiguous, aligned, and writeable, a copy that is will be made. Proper arrays and subclasses are passed unchanged. This is the source of the error reported above.

I'm not entirely sure I understand this -- how is PyArray_Convert used in numpy? For example, if I pass a non-contiguous array to your class Foo, np.asarray does not do a copy:

It is used by many (all?) C functions that take an array as input. This follows a different path than what np.asarray or np.asanyarray do, which are calls to np.array, which maps to the C function _array_fromobject which can be found here:

https://github.com/numpy/numpy/blob/maintenance/1.9.x/numpy/core/src/multiarray/multiarraymodule.c#L1592

And ufuncs have their own conversion code, which doesn't really help either. Not sure it would be possible to have them all use a common code base, but it is certainly well worth trying.
 

In [25]: orig = np.zeros((3, 4))[:2, :3]

In [26]: orig.flags
Out[26]:
  C_CONTIGUOUS : False
  F_CONTIGUOUS : False
  OWNDATA : False
  WRITEABLE : True
  ALIGNED : True
  UPDATEIFCOPY : False

In [27]: subclass = Foo(orig)

In [28]: np.asarray(subclass)
Out[28]:
array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

In [29]: np.asarray(subclass)[:] = 1

In [30]: np.asarray(subclass)
Out[30]:
array([[ 1.,  1.,  1.],
       [ 1.,  1.,  1.]])


But yes, this is probably a bug.

2. When converting these objects using PyArray_OutputConverter, as well as in similar code in the ufucn machinery, anything other than a proper array or subclass raises an error. This means that, contrary to what the docs on subclassing say, see below, you cannot use an object exposing the array interface as an output parameter to a ufunc

Here it might be a good idea to distinguish between objects that define __array__ vs __array_interface__/__array_struct__. A class that defines __array__ might not be very ndarray-like at all, but rather be something that can be *converted* to an ndarray. For example, objects in pandas define __array__, but updating the return value of df.__array__() in-place will not necessarily update the DataFrame (e.g., if the frame had inhomogeneous dtypes).

I am not really sure what the behavior of __array__ should be. The link to the subclassing docs I gave before indicates that it should be possible to write to it if it is writeable (and probably pandas should set the writeable flag to False if it cannot be reliably written to), but the obscure comment I mentioned seems to point to the opposite, that it should never be written to. This is probably a good moment in time to figure out what the proper behavior should be and document it.

Jaime

--
(\__/)
( O.o)
( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.