Re: [Numpydiscussion] Baffling error: ndarray += csc_matrix > "ValueError: setting an array element with a sequence"
27.09.2013 19:33, Nathaniel Smith kirjoitti: [clip]
I really don't understand what arcane magic is used to make ndarray += csc_matrix work at all, but my question is, is it going to break when we complete the casting transition described above? It was just supposed to catch things like int += float.
This maybe clarifies it:
import numpy import scipy.sparse x = numpy.ones((2,2)) y = scipy.sparse.csr_matrix(x) z = x z += y x
array([[ 1., 1.], [ 1., 1.]])
z
matrix([[ 2., 2.], [ 2., 2.]])
The execution flows like this:
ndarray.__iadd__(arr, sparr) np.add(arr, sparr, out=???) return NotImplemented # wtf return NotImplemented Python does arr = sparr.__radd__(arr)
Since Scipy master sparse matrices now have __numpy_ufunc__, but it doesn't handle out= arguments, the second step currently raises a TypeError (for Numpy master + Scipy master).
And this is actually the correct thing to do, as having np.add return NotImplemented is just broken. Only ndarray.__iadd__ has the authority to return the NotImplemented.
To make the inplace ops work again, it seems Numpy needs some additional fixes in its binary op machinery, before __numpy_ufunc__ business works fully as intended. Namely, the binary op routines will need to catch TypeErrors and convert them to NotImplemented.
The code paths where Numpy ufuncs currently return NotImplemented could also be changed to raise TypeErrors, but I'm not sure if someone somewhere relies on this behavior (I hope not).
On Fri, Sep 27, 2013 at 7:34 PM, Pauli Virtanen pav@iki.fi wrote:
27.09.2013 19:33, Nathaniel Smith kirjoitti: [clip]
I really don't understand what arcane magic is used to make ndarray += csc_matrix work at all, but my question is, is it going to break when we complete the casting transition described above? It was just supposed to catch things like int += float.
This maybe clarifies it:
import numpy import scipy.sparse x = numpy.ones((2,2)) y = scipy.sparse.csr_matrix(x) z = x z += y x
array([[ 1., 1.], [ 1., 1.]])
z
matrix([[ 2., 2.], [ 2., 2.]])
The execution flows like this:
ndarray.__iadd__(arr, sparr) np.add(arr, sparr, out=???) return NotImplemented # wtf return NotImplemented Python does arr = sparr.__radd__(arr)
Since Scipy master sparse matrices now have __numpy_ufunc__, but it doesn't handle out= arguments, the second step currently raises a TypeError (for Numpy master + Scipy master).
And this is actually the correct thing to do, as having np.add return NotImplemented is just broken. Only ndarray.__iadd__ has the authority to return the NotImplemented.
To make the inplace ops work again, it seems Numpy needs some additional fixes in its binary op machinery, before __numpy_ufunc__ business works fully as intended. Namely, the binary op routines will need to catch TypeErrors and convert them to NotImplemented.
The code paths where Numpy ufuncs currently return NotImplemented could also be changed to raise TypeErrors, but I'm not sure if someone somewhere relies on this behavior (I hope not).
Okay, so I see three separate issues: 1) My original concern, that the upcoming casting change for inplace operations will cause some horrible interaction. Tentatively this seems like it might be okay since even after the "cast" succeeds, np.add is still just refusing to do the operation, so hopefully we can set it up so that it will continue to fail once the casting rule becomes more strict. 2) The issue that ufuncs return NotImplemented and it makes baby Guido cry. This is completely broken, agreed. Not sure when someone will get around to clearing this stuff up. 3) The issue of how to make an inplace like ndarray += sparse continue to work in the brave new __numpy_ufunc__ world.
For this last issue, I think we disagree. It seems to me that the right answer is that csc_matrix.__numpy_ufunc__ needs to step up and start supporting out=! If I have a large dense ndarray and I try to += a sparse array to it, this operation should take no temporary memory and nnz time. Right now it sounds like it actually copies the large dense ndarray, which takes time and space proportional to its size. AFAICT the only way to avoid that is for scipy.sparse to implement out=. It shouldn't be that hard...?
n
27.09.2013 22:15, Nathaniel Smith kirjoitti: [clip]
 The issue of how to make an inplace like ndarray += sparse
continue to work in the brave new __numpy_ufunc__ world.
For this last issue, I think we disagree. It seems to me that the right answer is that csc_matrix.__numpy_ufunc__ needs to step up and start supporting out=! If I have a large dense ndarray and I try to += a sparse array to it, this operation should take no temporary memory and nnz time. Right now it sounds like it actually copies the large dense ndarray, which takes time and space proportional to its size. AFAICT the only way to avoid that is for scipy.sparse to implement out=. It shouldn't be that hard...?
Sure, scipy.sparse can easily support also the output argument.
But I still maintain that the implementation of __iadd__ in Numpy is wrong. What it does now is:
def __iadd__(self, other): return np.add(self, other, out=self)
But since we know this raises a TypeError if the input is of a type that cannot be dealt with, it should be:
def __iadd__(self, other): try: return np.add(self, other, out=self) except TypeError: return NotImplemented
Of course, it's written in C so it's a bit more painful to write this.
I think this will have little performance impact, since the check would be only a NULL check in the inplace op methods + subsequent handling. I can take a look at some point at this...
On Fri, Sep 27, 2013 at 8:27 PM, Pauli Virtanen pav@iki.fi wrote:
27.09.2013 22:15, Nathaniel Smith kirjoitti: [clip]
 The issue of how to make an inplace like ndarray += sparse
continue to work in the brave new __numpy_ufunc__ world.
For this last issue, I think we disagree. It seems to me that the right answer is that csc_matrix.__numpy_ufunc__ needs to step up and start supporting out=! If I have a large dense ndarray and I try to += a sparse array to it, this operation should take no temporary memory and nnz time. Right now it sounds like it actually copies the large dense ndarray, which takes time and space proportional to its size. AFAICT the only way to avoid that is for scipy.sparse to implement out=. It shouldn't be that hard...?
Sure, scipy.sparse can easily support also the output argument.
Great! I guess solving this somehow will be releasecritical, to avoid a regression in this case when __numpy_ufunc__ gets released. If the solution should be in scipy, I guess we should file the bug there?
But I still maintain that the implementation of __iadd__ in Numpy is wrong.
Oh yeah totally.
What it does now is:
def __iadd__(self, other): return np.add(self, other, out=self)
But since we know this raises a TypeError if the input is of a type that cannot be dealt with, it should be:
def __iadd__(self, other): try: return np.add(self, other, out=self) except TypeError: return NotImplemented
Of course, it's written in C so it's a bit more painful to write this.
I think this will have little performance impact, since the check would be only a NULL check in the inplace op methods + subsequent handling. I can take a look at some point at this...
I'm a little uncertain about the "swallow all TypeErrors" aspect of this  e.g. this could have really weird effects for object arrays, where ufuncs may raise arbitrary user exceptions.
One possibility in the long run is to just say, if you want to override ndarray __iadd__ or whatever, then you have to use __numpy_ufunc__. Not really much point in having *two* sets of implementations of the NotImplemented dance for the same operation.
n
participants (2)

Nathaniel Smith

Pauli Virtanen