[Numpy-discussion] Baffling error: ndarray += csc_matrix -> "ValueError: setting an array element with a sequence"

Fri Sep 27 15:15:37 EDT 2013

On Fri, Sep 27, 2013 at 7:34 PM, Pauli Virtanen <pav at iki.fi> wrote:
> 27.09.2013 19:33, Nathaniel Smith kirjoitti:
> [clip]
>> I really don't understand what arcane magic is used to make ndarray +=
>> csc_matrix work at all, but my question is, is it going to break when
>> we complete the casting transition described above? It was just
>> supposed to catch things like int += float.
>
> This maybe clarifies it:
>
>>>> import numpy
>>>> import scipy.sparse
>>>> x = numpy.ones((2,2))
>>>> y = scipy.sparse.csr_matrix(x)
>>>> z = x
>>>> z += y
>>>> x
> array([[ 1.,  1.],
>        [ 1.,  1.]])
>>>> z
> matrix([[ 2.,  2.],
>         [ 2.,  2.]])
>
> The execution flows like this:
>
> ndarray.__iadd__(arr, sparr)
>     np.add(arr, sparr, out=???)
>         return NotImplemented  # wtf
>     return NotImplemented
> Python does arr = sparr.__radd__(arr)
>
> Since Scipy master sparse matrices now have __numpy_ufunc__, but it
> doesn't handle out= arguments, the second step currently raises a
> TypeError (for Numpy master + Scipy master).
>
> And this is actually the correct thing to do, as having np.add return
> NotImplemented is just broken. Only ndarray.__iadd__ has the authority
> to return the NotImplemented.
>
> To make the in-place ops work again, it seems Numpy needs some
> additional fixes in its binary op machinery, before __numpy_ufunc__
> business works fully as intended. Namely, the binary op routines will
> need to catch TypeErrors and convert them to NotImplemented.
>
> The code paths where Numpy ufuncs currently return NotImplemented could
> also be changed to raise TypeErrors, but I'm not sure if someone
> somewhere relies on this behavior (I hope not).

Okay, so I see three separate issues:
1) My original concern, that the upcoming casting change for in-place
operations will cause some horrible interaction. Tentatively this
seems like it might be okay since even after the "cast" succeeds,
np.add is still just refusing to do the operation, so hopefully we can
set it up so that it will continue to fail once the casting rule
becomes more strict.
2) The issue that ufuncs return NotImplemented and it makes baby Guido
cry. This is completely broken, agreed. Not sure when someone will get
around to clearing this stuff up.
3) The issue of how to make an in-place like ndarray += sparse
continue to work in the brave new __numpy_ufunc__ world.

For this last issue, I think we disagree. It seems to me that the
right answer is that csc_matrix.__numpy_ufunc__ needs to step up and
start supporting out=! If I have a large dense ndarray and I try to +=
a sparse array to it, this operation should take no temporary memory
and nnz time. Right now it sounds like it actually copies the large
dense ndarray, which takes time and space proportional to its size.
AFAICT the only way to avoid that is for scipy.sparse to implement
out=. It shouldn't be that hard...?

-n