I won't be able to make it at scipy this year sadly.
I concur with Nathaniel that we can do a lot of things without a full rewrite -- it is all too easy to see what is gained with a rewrite and lose sight of what is lost. I have yet to see a really strong argument for a full rewrite. It may be easier to do a rewrite for a core when you have a few full-time people, but that's a different story for a community effort like numpy.
The main issue preventing new features in numpy is the lack of internal architecture at the C level, but nothing that could not be done by refactoring. Using cython to move away from the python C api would be great, though we need to talk with the cython people so that we can share common code between multiple extensions using cython, to avoid binary size explosion.
There are things that may require some backward incompatible changes in the C API, but that's much more acceptable than a significant break at the python level.
David
On Wed, Jun 4, 2014 at 9:58 AM, Sebastian Berg sebastian@sipsolutions.net wrote:
On Mi, 2014-06-04 at 02:26 +0100, Nathaniel Smith wrote:
On Wed, Jun 4, 2014 at 12:33 AM, Charles R Harris charlesr.harris@gmail.com wrote:
On Tue, Jun 3, 2014 at 5:08 PM, Kyle Mandli kyle.mandli@gmail.com
wrote:
Hello everyone,
As one of the co-chairs in charge of organizing the birds-of-a-feather sesssions at the SciPy conference this year, I wanted to solicit
through the
NumPy list to see if we could get enough interest to hold a NumPy
centered
BoF this year. The BoF format would be up to those who would lead the discussion, a couple of ideas used in the past include picking out a
few of
the lead devs to be on a panel and have a Q&A type of session or an
open Q&A
with perhaps audience guided list of topics. I can help facilitate organization of something but we would really like to get something organized this year (last year NumPy was the only major project that
was not
really represented in the BoF sessions).
I'll be at the conference, but I don't know who else will be there. I
feel
that NumPy has matured to the point where most of the current work is cleaning stuff up, making it run faster, and fixing bugs. A topic that
I'd
like to see discussed is where do we go from here. One option to look
at is
Blaze, which looks to have matured a lot in the last year. The problem
with
making it a NumPy replacement is that NumPy has become quite
widespread,
with downloads from PyPi running at about 3 million per year. With
that much
penetration it may be difficult for a new core like Blaze to gain
traction.
So I'd like to also discuss ways to bring the two branches of
development
together at some point and explore what NumPy can do to pave the way.
Mind,
there are definitely things that would be nice to add to NumPy, a
better
type system, missing values, etc., but doing that is difficult given
the
current design.
I won't be at the conference unfortunately (I'm on the wrong continent and have family commitments then anyway), but I think there's lots of exciting stuff that can be done in numpy-land.
I wouldn't like to come, but to be honest have not planned to yet and it doesn't fit too well with the stuff I work on mostly right now. So will have to see.
- Sebastian
We absolutely could rewrite the dtype system, and this would straightforwardly give us excellent support for missing values, units, categorical data, automatic differentiation, better datetimes, etc. etc. -- and make numpy much more friendly in general to third-party extensions.
I'd like to see the ufunc system revisited in the light of all the things we know now, to make gufuncs more first-class, provide better support for user-defined types, more flexible loop selection (e.g. make it possible to implement np.add.reduce(a, type="kahan")), etc.; one goal would be to convert a lot of ufunc-like functions (np.mean etc.) into being real ufuncs, and then they'd automatically benefit from __numpy_ufunc__, which would also massively improve interoperability with alternative array types like blaze.
I'd like to see support for extensible label-based indexing, like pandas.
Internally, I'd like to see internal migrating out of C and into Cython -- we have hundreds of lines of code that could be replaced with a few lines of Cython and no-one would notice. (Combining this with a cffi cython backend and pypy would be pretty interesting too...)
I'd like to see sparse ndarrays, with integration into the ufunc looping machinery so all ufuncs just work. Or even better, I'd like to see the right hooks added so that anyone can write a sparse ndarray package using only public APIs, and have all ufuncs just work. (I was going to put down deferred/loop-fused/out-of-core computation as a wishlist item too, but if we do it right then this too could be implemented by anyone without needing to be baked into numpy proper.)
All of these things would take some work and care, but I think they could all be done incrementally and without breaking backwards compatibility. Compare to ipython, which -- as Fernando likes to point out :-) -- went from a little console program to its current distributed-notebook-skynet-whatever-it-is by merging one working PR at a time. Certainly these changes would much easier and less disruptive than any plan that involves throwing out numpy and starting over. But they also do help smooth the way for an incremental transition to a world where numpy is regularly used alongside other libraries.
-n
NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion