On Mi, 2014-06-04 at 02:26 +0100, Nathaniel Smith wrote:
On Wed, Jun 4, 2014 at 12:33 AM, Charles R Harris email@example.com wrote:
On Tue, Jun 3, 2014 at 5:08 PM, Kyle Mandli firstname.lastname@example.org wrote:
As one of the co-chairs in charge of organizing the birds-of-a-feather sesssions at the SciPy conference this year, I wanted to solicit through the NumPy list to see if we could get enough interest to hold a NumPy centered BoF this year. The BoF format would be up to those who would lead the discussion, a couple of ideas used in the past include picking out a few of the lead devs to be on a panel and have a Q&A type of session or an open Q&A with perhaps audience guided list of topics. I can help facilitate organization of something but we would really like to get something organized this year (last year NumPy was the only major project that was not really represented in the BoF sessions).
I'll be at the conference, but I don't know who else will be there. I feel that NumPy has matured to the point where most of the current work is cleaning stuff up, making it run faster, and fixing bugs. A topic that I'd like to see discussed is where do we go from here. One option to look at is Blaze, which looks to have matured a lot in the last year. The problem with making it a NumPy replacement is that NumPy has become quite widespread, with downloads from PyPi running at about 3 million per year. With that much penetration it may be difficult for a new core like Blaze to gain traction. So I'd like to also discuss ways to bring the two branches of development together at some point and explore what NumPy can do to pave the way. Mind, there are definitely things that would be nice to add to NumPy, a better type system, missing values, etc., but doing that is difficult given the current design.
I won't be at the conference unfortunately (I'm on the wrong continent and have family commitments then anyway), but I think there's lots of exciting stuff that can be done in numpy-land.
I wouldn't like to come, but to be honest have not planned to yet and it doesn't fit too well with the stuff I work on mostly right now. So will have to see.
We absolutely could rewrite the dtype system, and this would straightforwardly give us excellent support for missing values, units, categorical data, automatic differentiation, better datetimes, etc. etc. -- and make numpy much more friendly in general to third-party extensions.
I'd like to see the ufunc system revisited in the light of all the things we know now, to make gufuncs more first-class, provide better support for user-defined types, more flexible loop selection (e.g. make it possible to implement np.add.reduce(a, type="kahan")), etc.; one goal would be to convert a lot of ufunc-like functions (np.mean etc.) into being real ufuncs, and then they'd automatically benefit from __numpy_ufunc__, which would also massively improve interoperability with alternative array types like blaze.
I'd like to see support for extensible label-based indexing, like pandas.
Internally, I'd like to see internal migrating out of C and into Cython -- we have hundreds of lines of code that could be replaced with a few lines of Cython and no-one would notice. (Combining this with a cffi cython backend and pypy would be pretty interesting too...)
I'd like to see sparse ndarrays, with integration into the ufunc looping machinery so all ufuncs just work. Or even better, I'd like to see the right hooks added so that anyone can write a sparse ndarray package using only public APIs, and have all ufuncs just work. (I was going to put down deferred/loop-fused/out-of-core computation as a wishlist item too, but if we do it right then this too could be implemented by anyone without needing to be baked into numpy proper.)
All of these things would take some work and care, but I think they could all be done incrementally and without breaking backwards compatibility. Compare to ipython, which -- as Fernando likes to point out :-) -- went from a little console program to its current distributed-notebook-skynet-whatever-it-is by merging one working PR at a time. Certainly these changes would much easier and less disruptive than any plan that involves throwing out numpy and starting over. But they also do help smooth the way for an incremental transition to a world where numpy is regularly used alongside other libraries.