NumPy/SciPy participation in GSoC 2013

Hi all, It is the time of the year for Google Summer of Code applications. If we want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html. We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with enough detail to be understandable for people new to the projects and a proposed mentor. We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful. Cheers, Ralf P.S. as you can probably tell from the above, I'm happy to coordinate the GSoC applications for Numpy and Scipy

On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
Hi all,
It is the time of the year for Google Summer of Code applications. If we want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html.
We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with enough detail to be understandable for people new to the projects and a proposed mentor.
We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful.
So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year. Ralf

On Tue, Mar 26, 2013 at 12:27 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
Hi all,
It is the time of the year for Google Summer of Code applications. If we want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html.
We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with enough detail to be understandable for people new to the projects and a proposed mentor.
We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful.
So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year.
Hi all, an update on GSoC'13. We do have enough mentoring power after all; NumPy/SciPy is now registered as a participating project on the PSF page: http://wiki.python.org/moin/SummerOfCode/2013 Prospective students: please have a look at http://wiki.python.org/moin/SummerOfCode/Expectations and at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. In particular note that we require you to make one pull request to NumPy/SciPy which has to be merged *before* the application deadline (May 3). So please start thinking about that, and start a discussion on your project idea on this list. Cheers, Ralf

On Mon, Apr 1, 2013 at 1:58 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Tue, Mar 26, 2013 at 12:27 AM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
Hi all,
It is the time of the year for Google Summer of Code applications. If we want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html.
We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with enough detail to be understandable for people new to the projects and a proposed mentor.
We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful.
So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year.
Hi all, an update on GSoC'13. We do have enough mentoring power after all; NumPy/SciPy is now registered as a participating project on the PSF page: http://wiki.python.org/moin/SummerOfCode/2013
Prospective students: please have a look at http://wiki.python.org/moin/SummerOfCode/Expectations and at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. In particular note that we require you to make one pull request to NumPy/SciPy which has to be merged *before* the application deadline (May 3). So please start thinking about that, and start a discussion on your project idea on this list.
Cheers, Ralf
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
There were a number of other ideas in this thread: http://mail.scipy.org/pipermail/numpy-discussion/2013-March/065699.html

On Mon, Apr 1, 2013 at 12:58 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Tue, Mar 26, 2013 at 12:27 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Hi all,
It is the time of the year for Google Summer of Code applications. If we want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html.
We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably with enough detail to be understandable for people new to the projects and a proposed mentor.
We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful.
So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year.
Hi all, an update on GSoC'13. We do have enough mentoring power after all; NumPy/SciPy is now registered as a participating project on the PSF page: http://wiki.python.org/moin/SummerOfCode/2013
Prospective students: please have a look at http://wiki.python.org/moin/SummerOfCode/Expectations and at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. In particular note that we require you to make one pull request to NumPy/SciPy which has to be merged *before* the application deadline (May 3). So please start thinking about that, and start a discussion on your project idea on this list.>
It doesn't look like I have the appropriate mojo to edit that page, but: - The NA thing at the bottom should just be deleted, dropping a student into that would be cruel... - Some new entries, perhaps someone could add? --------------- == Performance parity between numpy arrays and Python scalars == Small numpy arrays are very similar to Python scalars -- but numpy incurs a fair amount of extra overhead for simple operations. For large arrays this doesn't matter, but for code that manipulates a lot of small pieces of data, it can be a serious bottleneck. For example: {{{ In [1]: x = 1.0 In [2]: numpy_x = np.asarray(x) In [3]: timeit x + x 10000000 loops, best of 3: 61 ns per loop In [4]: timeit numpy_x + numpy_x 1000000 loops, best of 3: 1.66 us per loop }}} This project would involve profiling simple operations like the above, determining where the bottlenecks are, and devising improved algorithms to solve them, with the goal of getting the numpy time as close as possible to the Python time. Not only would this make all numpy-using code faster, but it would pave the way for future simplifications in numpy's core, which currently has a lot of duplicate code that attempts to work around these slow paths instead of fixing them properly. Some possible concrete changes: 1. numpy's "ufunc loop lookup code" (which is used to determine, e.g., whether to use the integer or floating-point versions of "+") is slow and inefficient. 2. Checking for floating point errors is very slow; we can and should do it less often. 3. When allocating the return value, the "+" for Python floats calls malloc() only once; numpy calls it twice (once for the array object itself, and a second time for the array data). Stashing both objects within a single allocation would be more efficient. 4. ...see what profiling says! We know 61 ns is possible. == Pythonic dtypes == A numpy "dtype" is an object that knows how to work with different sorts of values, represented as fixed-length packed binary values. For example, the int32 dtype knows how to convert the Python object '-1' to the four-byte buffer 0xff 0xff 0xff 0xff. Conceptually, dtype objects are arranged into a nice type hierarchy: http://docs.scipy.org/doc/numpy/_images/dtype-hierarchy.png But implementation-wise, dtypes don't use the Python class system at all. There's just a single Python class (numpy.dtype), and all dtypes are instances of it. (This is because when numpy was first designed, they only expected there to be maybe 20 dtype objects total.) This turns out to cause a number of problems -- you can't define new dtypes from Python, only from C; you can't use isinstance to compare dtypes (you have to use a hacky numpy-specific API instead); different dtypes can't easily contain state (instead, the single dtype class has gradually sprouted new fields as new dtypes turned out to need them); etc. Basically we've been reinventing the Python class system, poorly. The goal for this project is to turn dtype classes into regular Python classes with a proper type hierarchy and using the standard Python mechanisms. Longer term goals (at least the first of which is probably achievable within the SoC timeline): 1. Allow for defining new dtypes using pure Python. 2. There are a bunch of special cases in the ufunc code for handling strings and record arrays; we should make the appropriate extensions to the dtype API so that they can become regular dtypes. 3. A proper categorical data dtype. (This is trivial once the above is done.) 4. NA dtypes ------------------------------ I'm sort of tempted to propose "sparse ndarrays" as a project, but I think that's too ambitious and would just end in tears... I guess the only way to allow for incremental deliverables would be to develop it as a separate package, with the incremental pieces being changes to numpy core that made it possible to hook in such a deep change (e.g., new ufunc looping structure) from out-of-package code. -n

On Tue, Apr 2, 2013 at 5:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Mar 26, 2013 at 12:27 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
Hi all,
It is the time of the year for Google Summer of Code applications. If
we
want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under the PSF umbrella. They've outlined the timeline they're working by and guidelines at
http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html .
We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably
with
enough detail to be understandable for people new to the projects and a proposed mentor.
We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful.
So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year.
Hi all, an update on GSoC'13. We do have enough mentoring power after all; NumPy/SciPy is now registered as a participating project on the PSF page: http://wiki.python.org/moin/SummerOfCode/2013
Prospective students: please have a look at http://wiki.python.org/moin/SummerOfCode/Expectations and at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. In particular note that we require you to make one pull request to NumPy/SciPy which has to be merged *before* the application deadline (May 3). So please start
On Mon, Apr 1, 2013 at 12:58 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote: thinking
about that, and start a discussion on your project idea on this list.>
It doesn't look like I have the appropriate mojo to edit that page, but:
- The NA thing at the bottom should just be deleted, dropping a student into that would be cruel...
- Some new entries, perhaps someone could add?
---------------
== Performance parity between numpy arrays and Python scalars ==
Small numpy arrays are very similar to Python scalars -- but numpy incurs a fair amount of extra overhead for simple operations. For large arrays this doesn't matter, but for code that manipulates a lot of small pieces of data, it can be a serious bottleneck. For example: {{{ In [1]: x = 1.0
In [2]: numpy_x = np.asarray(x)
In [3]: timeit x + x 10000000 loops, best of 3: 61 ns per loop
In [4]: timeit numpy_x + numpy_x 1000000 loops, best of 3: 1.66 us per loop }}}
This project would involve profiling simple operations like the above, determining where the bottlenecks are, and devising improved algorithms to solve them, with the goal of getting the numpy time as close as possible to the Python time. Not only would this make all numpy-using code faster, but it would pave the way for future simplifications in numpy's core, which currently has a lot of duplicate code that attempts to work around these slow paths instead of fixing them properly.
Some possible concrete changes: 1. numpy's "ufunc loop lookup code" (which is used to determine, e.g., whether to use the integer or floating-point versions of "+") is slow and inefficient. 2. Checking for floating point errors is very slow; we can and should do it less often. 3. When allocating the return value, the "+" for Python floats calls malloc() only once; numpy calls it twice (once for the array object itself, and a second time for the array data). Stashing both objects within a single allocation would be more efficient. 4. ...see what profiling says! We know 61 ns is possible.
== Pythonic dtypes ==
A numpy "dtype" is an object that knows how to work with different sorts of values, represented as fixed-length packed binary values. For example, the int32 dtype knows how to convert the Python object '-1' to the four-byte buffer 0xff 0xff 0xff 0xff.
Conceptually, dtype objects are arranged into a nice type hierarchy: http://docs.scipy.org/doc/numpy/_images/dtype-hierarchy.png
But implementation-wise, dtypes don't use the Python class system at all. There's just a single Python class (numpy.dtype), and all dtypes are instances of it. (This is because when numpy was first designed, they only expected there to be maybe 20 dtype objects total.) This turns out to cause a number of problems -- you can't define new dtypes from Python, only from C; you can't use isinstance to compare dtypes (you have to use a hacky numpy-specific API instead); different dtypes can't easily contain state (instead, the single dtype class has gradually sprouted new fields as new dtypes turned out to need them); etc. Basically we've been reinventing the Python class system, poorly.
The goal for this project is to turn dtype classes into regular Python classes with a proper type hierarchy and using the standard Python mechanisms.
Longer term goals (at least the first of which is probably achievable within the SoC timeline): 1. Allow for defining new dtypes using pure Python. 2. There are a bunch of special cases in the ufunc code for handling strings and record arrays; we should make the appropriate extensions to the dtype API so that they can become regular dtypes. 3. A proper categorical data dtype. (This is trivial once the above is done.) 4. NA dtypes
------------------------------
Added those. Maybe one of the Trac admins can fix your edit rights.
I'm sort of tempted to propose "sparse ndarrays" as a project, but I think that's too ambitious and would just end in tears...
That indeed sounds way too ambitious. Besides, there's already the idea of fixing up scipy.sparse to be much more consistent with ndarrays. That will yield some of the same benefits and should be a lot easier to do. Ralf
I guess the only way to allow for incremental deliverables would be to develop it as a separate package, with the incremental pieces being changes to numpy core that made it possible to hook in such a deep change (e.g., new ufunc looping structure) from out-of-package code.

On Tue, Apr 2, 2013 at 12:02 PM, Ralf Gommers <ralf.gommers@gmail.com>wrote:
On Tue, Apr 2, 2013 at 5:24 PM, Nathaniel Smith <njs@pobox.com> wrote:
On Tue, Mar 26, 2013 at 12:27 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 21, 2013 at 10:20 PM, Ralf Gommers <ralf.gommers@gmail.com
wrote:
Hi all,
It is the time of the year for Google Summer of Code applications. If
we
want to participate with Numpy and/or Scipy, we need two things: enough mentors and ideas for projects. If we get those, we'll apply under
umbrella. They've outlined the timeline they're working by and guidelines at
http://pyfound.blogspot.nl/2013/03/get-ready-for-google-summer-of-code.html .
We should be able to come up with some interesting project ideas I'd think, let's put those at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. Preferably
with
enough detail to be understandable for people new to the projects and a proposed mentor.
We need at least 3 people willing to mentor a student. Ideally we'd have enough mentors this week, so we can apply to the PSF on time. If you're willing to be a mentor, please send me the following: name, email address, phone nr, and what you're interested in mentoring. If you have time constaints and have doubts about being able to be a primary mentor, being a backup mentor would also be helpful.
So far we've only got one primary mentor (thanks Chuck!), most core devs do not seem to have the bandwidth this year. If there are other people interested in mentoring please let me know. If not, then it looks like we're not participating this year.
Hi all, an update on GSoC'13. We do have enough mentoring power after all; NumPy/SciPy is now registered as a participating project on the PSF
http://wiki.python.org/moin/SummerOfCode/2013
Prospective students: please have a look at http://wiki.python.org/moin/SummerOfCode/Expectations and at http://projects.scipy.org/scipy/wiki/SummerofCodeIdeas. In particular note that we require you to make one pull request to NumPy/SciPy which has to be merged *before* the application deadline (May 3). So please start
On Mon, Apr 1, 2013 at 12:58 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote: the PSF page: thinking
about that, and start a discussion on your project idea on this list.>
It doesn't look like I have the appropriate mojo to edit that page, but:
- The NA thing at the bottom should just be deleted, dropping a student into that would be cruel...
- Some new entries, perhaps someone could add?
---------------
== Performance parity between numpy arrays and Python scalars ==
Small numpy arrays are very similar to Python scalars -- but numpy incurs a fair amount of extra overhead for simple operations. For large arrays this doesn't matter, but for code that manipulates a lot of small pieces of data, it can be a serious bottleneck. For example: {{{ In [1]: x = 1.0
In [2]: numpy_x = np.asarray(x)
In [3]: timeit x + x 10000000 loops, best of 3: 61 ns per loop
In [4]: timeit numpy_x + numpy_x 1000000 loops, best of 3: 1.66 us per loop }}}
This project would involve profiling simple operations like the above, determining where the bottlenecks are, and devising improved algorithms to solve them, with the goal of getting the numpy time as close as possible to the Python time. Not only would this make all numpy-using code faster, but it would pave the way for future simplifications in numpy's core, which currently has a lot of duplicate code that attempts to work around these slow paths instead of fixing them properly.
Some possible concrete changes: 1. numpy's "ufunc loop lookup code" (which is used to determine, e.g., whether to use the integer or floating-point versions of "+") is slow and inefficient. 2. Checking for floating point errors is very slow; we can and should do it less often. 3. When allocating the return value, the "+" for Python floats calls malloc() only once; numpy calls it twice (once for the array object itself, and a second time for the array data). Stashing both objects within a single allocation would be more efficient. 4. ...see what profiling says! We know 61 ns is possible.
== Pythonic dtypes ==
A numpy "dtype" is an object that knows how to work with different sorts of values, represented as fixed-length packed binary values. For example, the int32 dtype knows how to convert the Python object '-1' to the four-byte buffer 0xff 0xff 0xff 0xff.
Conceptually, dtype objects are arranged into a nice type hierarchy: http://docs.scipy.org/doc/numpy/_images/dtype-hierarchy.png
But implementation-wise, dtypes don't use the Python class system at all. There's just a single Python class (numpy.dtype), and all dtypes are instances of it. (This is because when numpy was first designed, they only expected there to be maybe 20 dtype objects total.) This turns out to cause a number of problems -- you can't define new dtypes from Python, only from C; you can't use isinstance to compare dtypes (you have to use a hacky numpy-specific API instead); different dtypes can't easily contain state (instead, the single dtype class has gradually sprouted new fields as new dtypes turned out to need them); etc. Basically we've been reinventing the Python class system, poorly.
The goal for this project is to turn dtype classes into regular Python classes with a proper type hierarchy and using the standard Python mechanisms.
Longer term goals (at least the first of which is probably achievable within the SoC timeline): 1. Allow for defining new dtypes using pure Python. 2. There are a bunch of special cases in the ufunc code for handling strings and record arrays; we should make the appropriate extensions to the dtype API so that they can become regular dtypes. 3. A proper categorical data dtype. (This is trivial once the above is done.) 4. NA dtypes
------------------------------
Added those. Maybe one of the Trac admins can fix your edit rights.
I'm having problems logging in also. I have an account in my name, but the password may have been changed along the line and apparently my email account also. <snip> Chuck
participants (4)
-
Charles R Harris
-
Nathaniel Smith
-
Ralf Gommers
-
Todd