[Numpy-discussion] numpy gsoc ideas (was: numpy gsoc topic idea: configurable algorithm precision and vector math library integration)

David Cournapeau cournape at gmail.com
Thu Mar 6 04:11:41 EST 2014


On Wed, Mar 5, 2014 at 9:11 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Mon, Mar 3, 2014 at 7:20 PM, Julian Taylor
> <jtaylor.debian at googlemail.com> wrote:
> > hi,
> >
> > as the numpy gsoc topic page is a little short on options I was thinking
> > about adding two topics for interested students. But as I have no
> > experience with gsoc or mentoring and the ideas are not very fleshed out
> > yet I'd like to ask if it might make sense at all:
> >
> > 1. configurable algorithm precision
> [...]
> > with np.precmode(default="fast"):
> >   np.abs(complex_array)
> >
> > or fast everything except sum and hypot
> >
> > with np.precmode(default="fast", sum="kahan", hypot="standard"):
> >   np.sum(d)
> [...]
>
> Not a big fan of this one -- it seems like the biggest bulk of the
> effort would be in figuring out a non-horrible API for exposing these
> things and getting consensus around it, which is not a good fit to the
> SoC structure.
>
> I'm pretty nervous about the datetime proposal that's currently on the
> wiki, for similar reasons -- I'm not sure it's actually doable in the
> SoC context.
>
> > 2. vector math library integration
>
> This is a great suggestion -- clear scope, clear benefit.
>
> Two more ideas:
>
> 3. Using Cython in the numpy core
>
> The numpy core contains tons of complicated C code implementing
> elaborate operations like indexing, casting, ufunc dispatch, etc. It
> would be really nice if we could use Cython to write some of these
> things. However, there is a practical problem: Cython assumes that
> each .pyx file generates a single compiled module with its own
> Cython-defined API. Numpy, however, contains a large number of .c
> files which are all compiled together into a single module, with its
> own home-brewed system for defining the public API. And we can't
> rewrite the whole thing. So for this to be viable, we would need some
> way to compile a bunch of .c *and .pyx* files together into a single
> module, and allow the .c and .pyx files to call each other. This might
> involve changes to Cython, some sort of clever post-processing or glue
> code to get existing cython-generated source code to play nicely with
> the rest of numpy, or something else.
>
> So this project would have the following goals, depending on how
> practical this turns out to be: (1) produce a hacky proof-of-concept
> system for doing the above, (2) turn the hacky proof-of-concept into
> something actually viable for use in real life (possibly this would
> require getting changes upstream into Cython, etc.), (3) use this
> system to actually port some interesting numpy code into cython.
>

Having to synchronise two projects may be hard for a GSoC, no ?

Otherwise, I am a bit worried about cython being used on the current C code
as is, because core and python C API are so interwined (especially
multiarray).

Maybe one could use cython on the non-core numpy parts that are still in C
? It is not as sexy of a project, though.



>
> 4. Pythonic dtypes
>
> The current dtype system is klugey. It basically defines its own class
> system, in parallel to Python's, and unsurprisingly, this new class
> system is not as good. In particular, it has limitations around the
> storage of instance-specific data which rule out a large variety of
> interesting user-defined dtypes, and causes us to need some truly
> nasty hacks to support the built-in dtypes we do have. And it makes
> defining a new dtype much more complicated than defining a new Python
> class.
>
> This project would be to implement a new dtype system for numpy, in
> which np.dtype becomes a near-empty base class, different dtypes
> (e.g., float64, float32) are simply different subclasses of np.dtype,
> and dtype objects are simply instances of these classes. Further
> enhancements would be to make it possible to define new dtypes in pure
> Python by subclassing np.dtype and implementing special methods for
> the various dtype operations, and to make it possible for ufunc loops
> to see the dtype objects.
>
> This project would provide the key enabling piece for a wide variety
> of interesting new features: missing value support, better handling of
> strings and categorical data, unit handling, automatic
> differentiation, and probably a bunch more I'm forgetting right now.
>
> If we get someone who's up to handling the dtype thing then I can
> mentor or co-mentor.
>
> What do y'all think?
>
> (I don't think I have access to update that wiki page -- or maybe I'm
> just not clever enough to figure out how -- so it would be helpful if
> someone who can, could?)
>
> --
> Nathaniel J. Smith
> Postdoctoral researcher - Informatics - University of Edinburgh
> http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20140306/5db7f20e/attachment.html>


More information about the NumPy-Discussion mailing list