GSoC proposal -- Numpy SciPy
![](https://secure.gravatar.com/avatar/97760e811af951a6b383a18865affefa.jpg?s=120&d=mm&r=g)
Hello, I'm writing a GSoC proposal, mostly concerning SciPy, but it involves a few changes to NumPy. The proposal is titled: Improvements to the sparse package of Scipy: support for bool dtype and better interaction with NumPy and can be found on my GitHub: https://github.com/cowlicks/GSoC-proposal/blob/master/proposal.markdown#nump... Basically, I want to change the ufunc class to be aware of SciPy's sparse matrices. So that when a ufunc is passed a sparse matrix as an argument, it will dispatch to a function in the sparse matrix package, which will then decide what to do. I just wanted to ping NumPy to make sure this is reasonable, and I'm not totally off track. Suggestions, feedback and criticism welcome. Thanks!
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 30, 2013 at 3:19 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:
How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking? -n
![](https://secure.gravatar.com/avatar/da3a0a1942fbdc5ee9a9b8115ac5dae7.jpg?s=120&d=mm&r=g)
30.04.2013 22:37, Nathaniel Smith kirjoitti: [clip]
To me it seems that the right thing to do here is the general solution. Do you see immediate problems in e.g. just enabling something like your _ufunc_override_? The easy thing is that there are no backward compatibility problems here, since if the magic is missing, the old logic is used. Currently, the numpy dot() and ufuncs also most of the time do nothing sensible with sparse matrix inputs even though they in some cases return values. Which then makes writing generic sparse/dense code more painful than just __mul__ being matrix multiplication. IIRC, I seem to remember that also the quantities package had some issues with operations involving ndarrays, to which being able to override this could be a solution. -- Pauli Virtanen
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 30, 2013 at 4:02 PM, Pauli Virtanen <pav@iki.fi> wrote:
Just that we might want to think a bit about the design space before implementing something. E.g., apparently doing Python attribute lookup is very expensive -- we recently had a patch to skip __array_interface__ checks whenever possible -- is adding another such per-operation overhead ok? I guess we could use similar checks (skip checking for known types like int/float/ndarray), or only check for _ufunc_override_ on the class (not the instance) and cache the result per-class?
I agree, but, if the main target is 'dot' then the current _ufunc_override_ design alone won't do it, since 'dot' is not a ufunc... -n
![](https://secure.gravatar.com/avatar/97760e811af951a6b383a18865affefa.jpg?s=120&d=mm&r=g)
Oh wow, I just assumed that `dot` was a ufunc... However, it would still be useful to have ufuncs working well with the sparse package. I don't understand everything that is going on in https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object... But I assumed that I would be able to add the ability to check for something like _ufunc_override_. I'm not sure where this piece of logic should be inserted, or what the performance implications to NumPy would be... I'm trying to figure this out. But major optimizations to ufuncs is out of the scope of this GSoC. I will look into what can be done about the `dot` function. On Tue, Apr 30, 2013 at 6:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
![](https://secure.gravatar.com/avatar/97760e811af951a6b383a18865affefa.jpg?s=120&d=mm&r=g)
There are several situations where that comes up (Like comparing two sparse matrices A == B) There is a SparseEfficiancyWarning that can be thrown, but the way this should be implemented still needs to be discussed. I will be writing a specification on how ufuncs and ndarrays are handled by the sparse package, the spec can be found here https://github.com/cowlicks/scipy-sparse-ndarray-and-ufunc-spec/blob/master/.... In general, a unary ufunc operating on a sparse matrix should return a sparse matrix. If you really want to do cos(sparse) you will be able to. But if you are just interested in the initially non zero elements should probably do something like: sparse.data = np.cos(sparse.data) On Wed, May 1, 2013 at 1:32 PM, Daπid <davidmenhur@gmail.com> wrote:
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 30, 2013 at 3:19 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:
How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking? -n
![](https://secure.gravatar.com/avatar/da3a0a1942fbdc5ee9a9b8115ac5dae7.jpg?s=120&d=mm&r=g)
30.04.2013 22:37, Nathaniel Smith kirjoitti: [clip]
To me it seems that the right thing to do here is the general solution. Do you see immediate problems in e.g. just enabling something like your _ufunc_override_? The easy thing is that there are no backward compatibility problems here, since if the magic is missing, the old logic is used. Currently, the numpy dot() and ufuncs also most of the time do nothing sensible with sparse matrix inputs even though they in some cases return values. Which then makes writing generic sparse/dense code more painful than just __mul__ being matrix multiplication. IIRC, I seem to remember that also the quantities package had some issues with operations involving ndarrays, to which being able to override this could be a solution. -- Pauli Virtanen
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Tue, Apr 30, 2013 at 4:02 PM, Pauli Virtanen <pav@iki.fi> wrote:
Just that we might want to think a bit about the design space before implementing something. E.g., apparently doing Python attribute lookup is very expensive -- we recently had a patch to skip __array_interface__ checks whenever possible -- is adding another such per-operation overhead ok? I guess we could use similar checks (skip checking for known types like int/float/ndarray), or only check for _ufunc_override_ on the class (not the instance) and cache the result per-class?
I agree, but, if the main target is 'dot' then the current _ufunc_override_ design alone won't do it, since 'dot' is not a ufunc... -n
![](https://secure.gravatar.com/avatar/97760e811af951a6b383a18865affefa.jpg?s=120&d=mm&r=g)
Oh wow, I just assumed that `dot` was a ufunc... However, it would still be useful to have ufuncs working well with the sparse package. I don't understand everything that is going on in https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object... But I assumed that I would be able to add the ability to check for something like _ufunc_override_. I'm not sure where this piece of logic should be inserted, or what the performance implications to NumPy would be... I'm trying to figure this out. But major optimizations to ufuncs is out of the scope of this GSoC. I will look into what can be done about the `dot` function. On Tue, Apr 30, 2013 at 6:53 PM, Nathaniel Smith <njs@pobox.com> wrote:
![](https://secure.gravatar.com/avatar/97760e811af951a6b383a18865affefa.jpg?s=120&d=mm&r=g)
There are several situations where that comes up (Like comparing two sparse matrices A == B) There is a SparseEfficiancyWarning that can be thrown, but the way this should be implemented still needs to be discussed. I will be writing a specification on how ufuncs and ndarrays are handled by the sparse package, the spec can be found here https://github.com/cowlicks/scipy-sparse-ndarray-and-ufunc-spec/blob/master/.... In general, a unary ufunc operating on a sparse matrix should return a sparse matrix. If you really want to do cos(sparse) you will be able to. But if you are just interested in the initially non zero elements should probably do something like: sparse.data = np.cos(sparse.data) On Wed, May 1, 2013 at 1:32 PM, Daπid <davidmenhur@gmail.com> wrote:
participants (5)
-
Blake Griffith
-
Charles R Harris
-
Daπid
-
Nathaniel Smith
-
Pauli Virtanen