GSoC project: draft of proposal

Hi, The attachment is my draft of proposal. The project is "vector math library integration". I think I need some feedback to make it solider. Any comment will be appreciated. Thanks in advance. Regards, Leo Mao

Hi Leo, Out of curiosity, which vector math libraries did you have in mind as likely candidates for inclusion? How are you planning on selecting the library to integrate? Cheers, Aron On Wed, Mar 12, 2014 at 12:52 PM, Leo Mao <lmao20001@gmail.com> wrote:
Hi, The attachment is my draft of proposal. The project is "vector math library integration". I think I need some feedback to make it solider. Any comment will be appreciated. Thanks in advance.
Regards, Leo Mao
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi Aron, Previously mentioned by Julian, Yeppp may be a good candidate. As for selecting a good library, I will consider the performance and the API of the library. The integration of the library should improve the performance of numpy and also not make the source too complicated to maintain. And I think the library should be mature so that the API will not be changed significantly. Please point out if there is something I miss. Also I will be grateful to any suggestions for my proposal. Regards, Leo Mao On Thu, Mar 13, 2014 at 12:54 AM, Aron Ahmadia <aron@ahmadia.net> wrote:
Hi Leo,
Out of curiosity, which vector math libraries did you have in mind as likely candidates for inclusion? How are you planning on selecting the library to integrate?
Cheers, Aron
On Wed, Mar 12, 2014 at 12:52 PM, Leo Mao <lmao20001@gmail.com> wrote:
Hi, The attachment is my draft of proposal. The project is "vector math library integration". I think I need some feedback to make it solider. Any comment will be appreciated. Thanks in advance.
Regards, Leo Mao
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Wed, Mar 12, 2014 at 11:12 AM, Leo Mao <lmao20001@gmail.com> wrote:
Hi Aron,
Previously mentioned by Julian, Yeppp may be a good candidate. As for selecting a good library, I will consider the performance and the API of the library. The integration of the library should improve the performance of numpy and also not make the source too complicated to maintain. And I think the library should be mature so that the API will not be changed significantly.
Please point out if there is something I miss. Also I will be grateful to any suggestions for my proposal.
Regards, Leo Mao
On Thu, Mar 13, 2014 at 12:54 AM, Aron Ahmadia <aron@ahmadia.net> wrote:
Hi Leo,
Out of curiosity, which vector math libraries did you have in mind as likely candidates for inclusion? How are you planning on selecting the library to integrate?
Cheers, Aron
On Wed, Mar 12, 2014 at 12:52 PM, Leo Mao <lmao20001@gmail.com> wrote:
Hi, The attachment is my draft of proposal. The project is "vector math library integration". I think I need some feedback to make it solider. Any comment will be appreciated. Thanks in advance.
Regards, Leo Mao
The proposal as it stands is too open ended and lacking in specifics. Probably you should select a library before the start of GSOC, or at least have a list of candidates, and also narrow the part of numpy you want to improve to something definite: linalg, special functions, etc. That doesn't mean you can't do more if time allows ;) An estimate of expected gains over current code would also help.
Chuck

Hi, Thanks a lot for your advice, Chuck. Following your advice, I have modified my draft of proposal. (attachment) I think it still needs more comments so that I can make it better. And I found that maybe I can also make some functions related to linalg (like dot, svd or something else) faster by integrating a proper library into numpy. Regards, Leo Mao On Thu, Mar 13, 2014 at 3:09 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Wed, Mar 12, 2014 at 11:12 AM, Leo Mao <lmao20001@gmail.com> wrote:
Hi Aron,
Previously mentioned by Julian, Yeppp may be a good candidate. As for selecting a good library, I will consider the performance and the API of the library. The integration of the library should improve the performance of numpy and also not make the source too complicated to maintain. And I think the library should be mature so that the API will not be changed significantly.
Please point out if there is something I miss. Also I will be grateful to any suggestions for my proposal.
Regards, Leo Mao
On Thu, Mar 13, 2014 at 12:54 AM, Aron Ahmadia <aron@ahmadia.net> wrote:
Hi Leo,
Out of curiosity, which vector math libraries did you have in mind as likely candidates for inclusion? How are you planning on selecting the library to integrate?
Cheers, Aron
On Wed, Mar 12, 2014 at 12:52 PM, Leo Mao <lmao20001@gmail.com> wrote:
Hi, The attachment is my draft of proposal. The project is "vector math library integration". I think I need some feedback to make it solider. Any comment will be appreciated. Thanks in advance.
Regards, Leo Mao
The proposal as it stands is too open ended and lacking in specifics. Probably you should select a library before the start of GSOC, or at least have a list of candidates, and also narrow the part of numpy you want to improve to something definite: linalg, special functions, etc. That doesn't mean you can't do more if time allows ;) An estimate of expected gains over current code would also help.
Chuck
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Thu, Mar 13, 2014 at 1:35 PM, Leo Mao <lmao20001@gmail.com> wrote:
And I found that maybe I can also make some functions related to linalg (like dot, svd or something else) faster by integrating a proper library into numpy.
I think everyone who wants fast numpy linalg already connects to something like OpenBLAS or MKL. When these are not available, numpy uses its own "lapack-lite" which is way slower. I don't think you are going to beat OpenBLAS, so are you suggesting to speed up the slow default "lapack-lite", or are you proposing something else?

On Fri, Mar 14, 2014 at 1:43 AM, alex <argriffi@ncsu.edu> wrote:
I think everyone who wants fast numpy linalg already connects to something like OpenBLAS or MKL. When these are not available, numpy uses its own "lapack-lite" which is way slower. I don't think you are going to beat OpenBLAS, so are you suggesting to speed up the slow default "lapack-lite", or are you proposing something else?
I think most CPUs nowadays support instructions like SSE2, AVX etc, so maybe numpy can use OpenBLAS (or somethine else) by default ?

Am 13.03.2014 um 18:35 schrieb Leo Mao <lmao20001@gmail.com>:
Hi,
Thanks a lot for your advice, Chuck. Following your advice, I have modified my draft of proposal. (attachment) I think it still needs more comments so that I can make it better.
And I found that maybe I can also make some functions related to linalg (like dot, svd or something else) faster by integrating a proper library into numpy.
Regards, Leo Mao
Dear Leo, large parts of your proposal are covered by the uvml package https://github.com/geggo/uvml In my opinion you should also consider Intels VML (part of MKL) as a candidate. (Yes I know, it is not free). To my best knowledge it provides many more vectorized functions than the open source alternatives. Concerning your time table, once you implemented support for one function, adding more functions is very easy. Gregor

On Friday, March 14, 2014, Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
Am 13.03.2014 um 18:35 schrieb Leo Mao <lmao20001@gmail.com <javascript:;>
:
Hi,
Thanks a lot for your advice, Chuck. Following your advice, I have modified my draft of proposal. (attachment) I think it still needs more comments so that I can make it better.
And I found that maybe I can also make some functions related to linalg (like dot, svd or something else) faster by integrating a proper library into numpy.
Regards, Leo Mao
Dear Leo,
large parts of your proposal are covered by the uvml package https://github.com/geggo/uvml In my opinion you should also consider Intels VML (part of MKL) as a candidate. (Yes I know, it is not free). To my best knowledge it provides many more vectorized functions than the open source alternatives. Concerning your time table, once you implemented support for one function, adding more functions is very easy.
Gregor
I'm not sure that your week old project is enough to discourage this gsoc project. In particular, it would be nice to be able to ship this directly as part of numpy and that won't really be possible with mlk. Eric
__________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org <javascript:;> http://mail.scipy.org/mailman/listinfo/numpy-discussion

Am 14.03.2014 um 11:00 schrieb Eric Moore <ewm@redtetrahedron.org>:
On Friday, March 14, 2014, Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
Am 13.03.2014 um 18:35 schrieb Leo Mao <lmao20001@gmail.com>:
Hi,
Thanks a lot for your advice, Chuck. Following your advice, I have modified my draft of proposal. (attachment) I think it still needs more comments so that I can make it better.
And I found that maybe I can also make some functions related to linalg (like dot, svd or something else) faster by integrating a proper library into numpy.
Regards, Leo Mao
Dear Leo,
large parts of your proposal are covered by the uvml package https://github.com/geggo/uvml In my opinion you should also consider Intels VML (part of MKL) as a candidate. (Yes I know, it is not free). To my best knowledge it provides many more vectorized functions than the open source alternatives. Concerning your time table, once you implemented support for one function, adding more functions is very easy.
Gregor
I'm not sure that your week old project is enough to discourage this gsoc project. In particular, it would be nice to be able to ship this directly as part of numpy and that won't really be possible with mlk.
Eric
Hi, it's not at all my intention to discourage this project. I hope Leo Mao can use the uvml package as a starting point for further improvements. Since most vectorized math libraries share a very similar interface, I think the actual choice of the library could be made a configurable option. Adapting uvml to use e.g. yeppp instead of MKL should be straightforward. Similar to numpy or scipy built with MKL lapack and distributed by enthought or Christoph Gohlke, using MKL should not be ruled out completely. Gregor

Just a comment, supporting a library that is bsd 3 clauses could help to higly reduce the compilation problem like what we have with blas. We could just include it in numpy/download it automatically or whatever to make the install trivial and then we could suppose all users have it. Deadling with blas is already not fun, if new dependency could be trivial to link to, it would be great. Fred On Fri, Mar 14, 2014 at 8:57 AM, Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
Am 14.03.2014 um 11:00 schrieb Eric Moore <ewm@redtetrahedron.org>:
On Friday, March 14, 2014, Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
Am 13.03.2014 um 18:35 schrieb Leo Mao <lmao20001@gmail.com>:
Hi,
Thanks a lot for your advice, Chuck. Following your advice, I have modified my draft of proposal. (attachment) I think it still needs more comments so that I can make it better.
And I found that maybe I can also make some functions related to linalg (like dot, svd or something else) faster by integrating a proper library into numpy.
Regards, Leo Mao
Dear Leo,
large parts of your proposal are covered by the uvml package https://github.com/geggo/uvml In my opinion you should also consider Intels VML (part of MKL) as a candidate. (Yes I know, it is not free). To my best knowledge it provides many more vectorized functions than the open source alternatives. Concerning your time table, once you implemented support for one function, adding more functions is very easy.
Gregor
I'm not sure that your week old project is enough to discourage this gsoc project. In particular, it would be nice to be able to ship this directly as part of numpy and that won't really be possible with mlk.
Eric
Hi,
it's not at all my intention to discourage this project. I hope Leo Mao can use the uvml package as a starting point for further improvements. Since most vectorized math libraries share a very similar interface, I think the actual choice of the library could be made a configurable option. Adapting uvml to use e.g. yeppp instead of MKL should be straightforward. Similar to numpy or scipy built with MKL lapack and distributed by enthought or Christoph Gohlke, using MKL should not be ruled out completely.
Gregor
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi everyone, Thanks for your relies! I think Gregor's uvml package is really a good starting point for me. I think the actual choice of the library could be made a configurable
option.
Sounds like a good idea? If the implementations are very similar, maybe I can implement multiple libraries bindings? A potential issue is that some libraries may lack some functions. For example, Yeppp is a good candidates as long as it provides pre-build libraries on many platforms and its API is pretty clear. But Yeppp lacks some functions like inverse trigonometric functions. Intels VML provides much more functions but sadly it is not free. I found another library called Vc, which looks like a potential candidates for this project: http://code.compeng.uni-frankfurt.de/projects/vc I haven't digged into it yet so I'm not sure if it provides what we want. supporting a library that is bsd 3 clauses could help
to higly reduce the compilation problem like what we have with blas.
Yeppp is bsd 3 clauses so I think Yeppp is really a good choice. Is there a list of licenses which can be added into numpy without pain? (how about LGPL3 ?) Regards, Leo Mao On Fri, Mar 14, 2014 at 9:20 PM, Frédéric Bastien <nouiz@nouiz.org> wrote:
Just a comment, supporting a library that is bsd 3 clauses could help to higly reduce the compilation problem like what we have with blas. We could just include it in numpy/download it automatically or whatever to make the install trivial and then we could suppose all users have it. Deadling with blas is already not fun, if new dependency could be trivial to link to, it would be great.
Fred
On Fri, Mar 14, 2014 at 8:57 AM, Gregor Thalhammer <gregor.thalhammer@gmail.com> wrote:
Am 14.03.2014 um 11:00 schrieb Eric Moore <ewm@redtetrahedron.org>:
On Friday, March 14, 2014, Gregor Thalhammer <
wrote:
Am 13.03.2014 um 18:35 schrieb Leo Mao <lmao20001@gmail.com>:
Hi,
Thanks a lot for your advice, Chuck. Following your advice, I have modified my draft of proposal. (attachment) I think it still needs more comments so that I can make it better.
And I found that maybe I can also make some functions related to
(like dot, svd or something else) faster by integrating a proper
into numpy.
Regards, Leo Mao
Dear Leo,
large parts of your proposal are covered by the uvml package https://github.com/geggo/uvml In my opinion you should also consider Intels VML (part of MKL) as a candidate. (Yes I know, it is not free). To my best knowledge it
many more vectorized functions than the open source alternatives. Concerning your time table, once you implemented support for one function, adding more functions is very easy.
Gregor
I'm not sure that your week old project is enough to discourage this gsoc project. In particular, it would be nice to be able to ship this
part of numpy and that won't really be possible with mlk.
Eric
Hi,
it's not at all my intention to discourage this project. I hope Leo Mao can use the uvml package as a starting point for further improvements. Since most vectorized math libraries share a very similar interface, I think
gregor.thalhammer@gmail.com> linalg library provides directly as the
actual choice of the library could be made a configurable option. Adapting uvml to use e.g. yeppp instead of MKL should be straightforward. Similar to numpy or scipy built with MKL lapack and distributed by enthought or Christoph Gohlke, using MKL should not be ruled out completely.
Gregor
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

On Fri, Mar 14, 2014 at 4:33 PM, Leo Mao <lmao20001@gmail.com> wrote:
Yeppp is bsd 3 clauses so I think Yeppp is really a good choice. Is there a list of licenses which can be added into numpy without pain? (how about LGPL3 ?)
No, just BSD and its rough equivalents like the Expat license. -- Robert Kern

On 14.03.2014 17:40, Robert Kern wrote:
On Fri, Mar 14, 2014 at 4:33 PM, Leo Mao <lmao20001@gmail.com> wrote:
Yeppp is bsd 3 clauses so I think Yeppp is really a good choice. Is there a list of licenses which can be added into numpy without pain? (how about LGPL3 ?)
No, just BSD and its rough equivalents like the Expat license.
They can't be added into numpy, but support linking or building against a non-bsd like library can still be added. Only our binary distributions are limited in what we can use.

On Fri, Mar 14, 2014 at 7:42 PM, Julian Taylor <jtaylor.debian@googlemail.com> wrote:
On 14.03.2014 17:40, Robert Kern wrote:
On Fri, Mar 14, 2014 at 4:33 PM, Leo Mao <lmao20001@gmail.com> wrote:
Yeppp is bsd 3 clauses so I think Yeppp is really a good choice. Is there a list of licenses which can be added into numpy without pain? (how about LGPL3 ?)
No, just BSD and its rough equivalents like the Expat license.
They can't be added into numpy, but support linking or building against a non-bsd like library can still be added. Only our binary distributions are limited in what we can use.
Optionally, yes. -- Robert Kern

Because of the license problem, I think I will choose Yeppp as a default backend. And if time allows, maybe I can implement other bindings. (Vc library) Also I found that sleef library is in public domain. But it seems that it only provides fast math function, not "vectorized math function". So I am not sure if it can be used in this project. Finally, if there are any suggestions for my proposal, please point out. I will appreciate your suggestions. Thanks. Regards, Leo Mao

On 12.03.2014 17:52, Leo Mao wrote:
Hi, The attachment is my draft of proposal. The project is "vector math library integration". I think I need some feedback to make it solider. Any comment will be appreciated. Thanks in advance.
hi, I finally had some time too properly look at your proposal, here are my comments. First of all I hope you are aware this is a very challenging project as you will have to deal with issues of several different areas: build systems, portability, low level performance, numerical issues, testing and the in some places quite daunting numpy codebase. I do fear that it might be too much for a first year student. Your proposal is lacking some information on your experiences. Are you already familiar with vectorization via SIMD? While the goal of this project is partly to avoid writing more vector code in NumPy it is still very useful if you are familiar with how it works. If you have no experience maybe add some time to learning the basics to the schedule. The numerical accuracy of the vector library needs to be evaluated, I suspect that this might be the biggest roadblock in adding support by default. The performance of the library over different value ranges also needs to investigated. What kind of hardware do you have at your disposal? SIMD vectorization performance is very hardware dependent, you probably want at least a intel sandy bridge or AMD bulldozer type cpu to get the most out of the library, those CPUs have the newish AVX SIMD instructions. While I think your schedule is already packed, another point you could add if you have extra time is extending the existing SSE vectorized code in numpy to AVX if the vector library does not provide an equivalent (e.g. probably the boolean stuff) The runtime feature detection vector libraries provide can be very useful for this. Regards, Julian Taylor
participants (9)
-
alex
-
Aron Ahmadia
-
Charles R Harris
-
Eric Moore
-
Frédéric Bastien
-
Gregor Thalhammer
-
Julian Taylor
-
Leo Mao
-
Robert Kern