Mailman 3 GSoC proposal -- Numpy SciPy - NumPy-Discussion

GSoC proposal -- Numpy SciPy

Blake Griffith

April 30, 2013

12:19 p.m.

Hello, I'm writing a GSoC proposal, mostly concerning SciPy, but it involves a few changes to NumPy. The proposal is titled: Improvements to the sparse package of Scipy: support for bool dtype and better interaction with NumPy and can be found on my GitHub: https://github.com/cowlicks/GSoC-proposal/blob/master/proposal.markdown#nump... Basically, I want to change the ufunc class to be aware of SciPy's sparse matrices. So that when a ufunc is passed a sparse matrix as an argument, it will dispatch to a function in the sparse matrix package, which will then decide what to do. I just wanted to ping NumPy to make sure this is reasonable, and I'm not totally off track. Suggestions, feedback and criticism welcome. Thanks!

Attachments:

attachment.htm (text/html — 1.0 KB)

Show replies by date

Nathaniel Smith

April 2013

12:37 p.m.

On Tue, Apr 30, 2013 at 3:19 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:

...

Hello, I'm writing a GSoC proposal, mostly concerning SciPy, but it involves a few changes to NumPy. The proposal is titled: Improvements to the sparse package of Scipy: support for bool dtype and better interaction with NumPy and can be found on my GitHub: https://github.com/cowlicks/GSoC-proposal/blob/master/proposal.markdown#nump...

Basically, I want to change the ufunc class to be aware of SciPy's sparse matrices. So that when a ufunc is passed a sparse matrix as an argument, it will dispatch to a function in the sparse matrix package, which will then decide what to do. I just wanted to ping NumPy to make sure this is reasonable, and I'm not totally off track. Suggestions, feedback and criticism welcome.

How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking? -n

Charles R Harris

1 p.m.

On Tue, Apr 30, 2013 at 1:37 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

On Tue, Apr 30, 2013 at 3:19 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:

...
Hello, I'm writing a GSoC proposal, mostly concerning SciPy, but it involves a few changes to NumPy. The proposal is titled: Improvements to the sparse package of Scipy: support for bool dtype and better interaction with NumPy and can be found on my GitHub:

https://github.com/cowlicks/GSoC-proposal/blob/master/proposal.markdown#nump...

...
Basically, I want to change the ufunc class to be aware of SciPy's sparse matrices. So that when a ufunc is passed a sparse matrix as an argument,

it

...
will dispatch to a function in the sparse matrix package, which will then decide what to do. I just wanted to ping NumPy to make sure this is reasonable, and I'm not totally off track. Suggestions, feedback and criticism welcome.

How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in: https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking?

ISTR that Mark Wiebe also had thoughts for that functionality. There was a thread on the topic but I don't recall the time. Chuck

Pauli Virtanen

1:02 p.m.

30.04.2013 22:37, Nathaniel Smith kirjoitti: [clip]

...

How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in:

https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py

but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking?

To me it seems that the right thing to do here is the general solution. Do you see immediate problems in e.g. just enabling something like your _ufunc_override_? The easy thing is that there are no backward compatibility problems here, since if the magic is missing, the old logic is used. Currently, the numpy dot() and ufuncs also most of the time do nothing sensible with sparse matrix inputs even though they in some cases return values. Which then makes writing generic sparse/dense code more painful than just __mul__ being matrix multiplication. IIRC, I seem to remember that also the quantities package had some issues with operations involving ndarrays, to which being able to override this could be a solution. -- Pauli Virtanen

Nathaniel Smith

4:53 p.m.

On Tue, Apr 30, 2013 at 4:02 PM, Pauli Virtanen <pav@iki.fi> wrote:

...

30.04.2013 22:37, Nathaniel Smith kirjoitti: [clip]

...
How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in:

https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py

but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking?

To me it seems that the right thing to do here is the general solution.

Do you see immediate problems in e.g. just enabling something like your _ufunc_override_?

Just that we might want to think a bit about the design space before implementing something. E.g., apparently doing Python attribute lookup is very expensive -- we recently had a patch to skip __array_interface__ checks whenever possible -- is adding another such per-operation overhead ok? I guess we could use similar checks (skip checking for known types like int/float/ndarray), or only check for _ufunc_override_ on the class (not the instance) and cache the result per-class?

...

The easy thing is that there are no backward compatibility problems here, since if the magic is missing, the old logic is used. Currently, the numpy dot() and ufuncs also most of the time do nothing sensible with sparse matrix inputs even though they in some cases return values. Which then makes writing generic sparse/dense code more painful than just __mul__ being matrix multiplication.

I agree, but, if the main target is 'dot' then the current _ufunc_override_ design alone won't do it, since 'dot' is not a ufunc... -n

Blake Griffith

May 2013

11:12 a.m.

Oh wow, I just assumed that `dot` was a ufunc... However, it would still be useful to have ufuncs working well with the sparse package. I don't understand everything that is going on in https://github.com/numpy/numpy/blob/master/numpy/core/src/umath/ufunc_object... But I assumed that I would be able to add the ability to check for something like _ufunc_override_. I'm not sure where this piece of logic should be inserted, or what the performance implications to NumPy would be... I'm trying to figure this out. But major optimizations to ufuncs is out of the scope of this GSoC. I will look into what can be done about the `dot` function. On Tue, Apr 30, 2013 at 6:53 PM, Nathaniel Smith <njs@pobox.com> wrote:

...

On Tue, Apr 30, 2013 at 4:02 PM, Pauli Virtanen <pav@iki.fi> wrote:

...
30.04.2013 22:37, Nathaniel Smith kirjoitti: [clip]

...
How do you plan to go about this? The obvious option of just calling scipy.sparse.issparse() on ufunc entry raises some problems, since numpy can't depend on or even import scipy, and we might be reluctant to add such a special case for what's a rather more general problem. OTOH it might be possible to solve the problem in general, e.g., see the prototyped _ufunc_override_ special method in:

https://github.com/njsmith/numpyNEP/blob/master/numpyNEP.py

but I don't know if you want to get into such a debate within the scope of your GSoC. What were you thinking?

To me it seems that the right thing to do here is the general solution.

Do you see immediate problems in e.g. just enabling something like your _ufunc_override_?

Just that we might want to think a bit about the design space before implementing something. E.g., apparently doing Python attribute lookup is very expensive -- we recently had a patch to skip __array_interface__ checks whenever possible -- is adding another such per-operation overhead ok? I guess we could use similar checks (skip checking for known types like int/float/ndarray), or only check for _ufunc_override_ on the class (not the instance) and cache the result per-class?

...
The easy thing is that there are no backward compatibility problems here, since if the magic is missing, the old logic is used. Currently, the numpy dot() and ufuncs also most of the time do nothing sensible with sparse matrix inputs even though they in some cases return values. Which then makes writing generic sparse/dense code more painful than just __mul__ being matrix multiplication.

I agree, but, if the main target is 'dot' then the current _ufunc_override_ design alone won't do it, since 'dot' is not a ufunc...

-n _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Daπid

11:32 a.m.

On 1 May 2013 20:12, Blake Griffith <blake.a.griffith@gmail.com> wrote:

...

However, it would still be useful to have ufuncs working well with the sparse package.

How are you planning to deal with ufunc(0) != 0? cos(sparse) is actually dense.

Blake Griffith

12:02 p.m.

There are several situations where that comes up (Like comparing two sparse matrices A == B) There is a SparseEfficiancyWarning that can be thrown, but the way this should be implemented still needs to be discussed. I will be writing a specification on how ufuncs and ndarrays are handled by the sparse package, the spec can be found here https://github.com/cowlicks/scipy-sparse-ndarray-and-ufunc-spec/blob/master/.... In general, a unary ufunc operating on a sparse matrix should return a sparse matrix. If you really want to do cos(sparse) you will be able to. But if you are just interested in the initially non zero elements should probably do something like: sparse.data = np.cos(sparse.data) On Wed, May 1, 2013 at 1:32 PM, Daπid <davidmenhur@gmail.com> wrote:

...

On 1 May 2013 20:12, Blake Griffith <blake.a.griffith@gmail.com> wrote:

...
However, it would still be useful to have ufuncs working well with the sparse package.

How are you planning to deal with ufunc(0) != 0? cos(sparse) is actually dense. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

4282

Age (days ago)

4283

Last active (days ago)

List overview

Download

7 comments

5 participants

participants (5)

Blake Griffith
Charles R Harris
Daπid
Nathaniel Smith
Pauli Virtanen

GSoC proposal -- Numpy SciPy

tags

participants (5)