Re: [SciPy-Dev] [Numpy-discussion] Baffling error: ndarray += csc_matrix -> "ValueError: setting an array element with a sequence"
On Fri, Sep 27, 2013 at 8:27 PM, Pauli Virtanen <pav@iki.fi> wrote:
27.09.2013 22:15, Nathaniel Smith kirjoitti: [clip]
3) The issue of how to make an in-place like ndarray += sparse continue to work in the brave new __numpy_ufunc__ world.
For this last issue, I think we disagree. It seems to me that the right answer is that csc_matrix.__numpy_ufunc__ needs to step up and start supporting out=! If I have a large dense ndarray and I try to += a sparse array to it, this operation should take no temporary memory and nnz time. Right now it sounds like it actually copies the large dense ndarray, which takes time and space proportional to its size. AFAICT the only way to avoid that is for scipy.sparse to implement out=. It shouldn't be that hard...?
Sure, scipy.sparse can easily support also the output argument.
Great! I guess solving this somehow will be release-critical, to avoid a regression in this case when __numpy_ufunc__ gets released. If the solution should be in scipy, I guess we should file the bug there?
But I still maintain that the implementation of __iadd__ in Numpy is wrong.
Oh yeah totally.
What it does now is:
def __iadd__(self, other): return np.add(self, other, out=self)
But since we know this raises a TypeError if the input is of a type that cannot be dealt with, it should be:
def __iadd__(self, other): try: return np.add(self, other, out=self) except TypeError: return NotImplemented
Of course, it's written in C so it's a bit more painful to write this.
I think this will have little performance impact, since the check would be only a NULL check in the inplace op methods + subsequent handling. I can take a look at some point at this...
I'm a little uncertain about the "swallow all TypeErrors" aspect of this -- e.g. this could have really weird effects for object arrays, where ufuncs may raise arbitrary user exceptions. One possibility in the long run is to just say, if you want to override ndarray __iadd__ or whatever, then you have to use __numpy_ufunc__. Not really much point in having *two* sets of implementations of the NotImplemented dance for the same operation. -n
27.09.2013 23:33, Nathaniel Smith kirjoitti: [clip]
Great! I guess solving this somehow will be release-critical, to avoid a regression in this case when __numpy_ufunc__ gets released. If the solution should be in scipy, I guess we should file the bug there?
It's release-critical, but the feature is added only in Numpy 1.9 and Scipy 0.14.0, so there's several months time to iron out the bugs here. Scipy bug: https://github.com/scipy/scipy/issues/2938 Numpy bug: https://github.com/numpy/numpy/issues/3812 [clip]
I'm a little uncertain about the "swallow all TypeErrors" aspect of this -- e.g. this could have really weird effects for object arrays, where ufuncs may raise arbitrary user exceptions.
A second alternative here would be to pass an additional internal-use keyword argument to the generic ufunc that instructs it to return NotImplemented rather than raising errors. This could also make the ufuncs better Python citizens by stopping them from littering NotImplemented around.
One possibility in the long run is to just say, if you want to override ndarray __iadd__ or whatever, then you have to use __numpy_ufunc__. Not really much point in having *two* sets of implementations of the NotImplemented dance for the same operation.
I think __numpy_ufunc__ breaking the default Python __*__ binary op system is a bit nasty, and would be better avoided if possible. -- Pauli Virtanen
participants (2)
-
Nathaniel Smith -
Pauli Virtanen