I will be at the conference as will Mark Wiebe for at least part of the time.  Others from the Blaze team like Andy Terrel and Matthew Rocklin will also be available at least part of the time (so it depends on when the BoF is).   I'm sure they will all have opinions about this.   I would be happy to be involved with a discussion around the future of NumPy as it is one of the things I've been thinking about for quite a while.   

Obviously, what happens will be more a function of what people have resources to do than just what is discussed, but it is helpful to get people from multiple projects discussing what they are working on and how it relates or could relate to a possible NumPy 2.0 effort.    I'm happy to participate. 

My bias is that I do not believe it is going to be possible practically to simply modify NumPy itself directly.  This was the original direction we considered when we started Continuum -- and spent some time and money in that direction --- but it's a difficult problem that would require a lot of time and patience and testing from multiple people.   I'm not sure IPython is the right project to compare against here as it's user-story is quite different.   NumPy is already a hybrid that evolved from Numeric.  

Of course it is likely *technically* feasible.    We could replace every implementation detail with something different --- but not without likely impact on users and more cost than it would be just to re-write sections.   However, the challenge is more about the user-base (especially the silent but large user-base), the semantic expectations of that user base, and the challenge that exists in really creating a test suite that covers the entire surface area of actual NumPy use.   

Even relatively simple changes can have significant impact at this point.  Nathaniel has laid out a fantastic list of great features.  These are the kind of features I have been eager to see as well.  This is why I have been working to fund and help explore these ideas in the Numba array object as well as in Blaze.    Gnumpy, Theano, Pandas, and other projects also have useful tales to tell regarding a potential NumPy 2.0.   

Ultimately, I do think it is time to talk seriously about NumPy 2.0, and what it might look like.   I personally think it looks a lot more like a re-write, than a continuation of the modifications of Numeric that became NumPy 1.0.     Right out of the gate,  for example, I would make sure that NumPy 2.0 objects somehow used PyObject_VAR_HEAD so that they were variable-sized objects where the strides and dimension information was stored directly in the object structure itself instead of allocated separately (thus requiring additional loads and stores from memory).   This would be a relatively simple change.  But, it can't be done and preserve ABI compatibility.  It may also, at this point, have impact on Cython code, or other code that is deeply-aware of the NumPy code-structure.     Some of the changes that should be made will ultimately require a porting exercise for new code --- at which point why not just use a new project. 

Dynd (which is a separate but related project from Blaze) is actually a pretty good start to a NumPy 2.0 already:  https://github.com/ContinuumIO/dynd-python and https://github.com/ContinuumIO/libdynd (C++ library).

It can be provided with a backwards-compatible API without too much difficulty so that extension modules built for NumPy 1.X would still work.   Numba can support Dynd and Numba's array object provides a useful, deferred-expression evaluation mechanism along with JIT compilation when desired that can support the GPU.    

I would make the case that by the end of the year this combination of Dynd plus Numba (and it's array object) could easily provide much of the functionality needed for a solid NumPy++.    Separate from that, Blaze provides a pluggable mechanism so that array-oriented computations can be done on a large-variety of backends (including distributed systems). 

I agree that users of NumPy should not have to see a big API change in 2.0 --- but any modification of indexing or calculations would present slightly different semantics in certain corner cases --- which I think will be unavoidable in NumPy 2.0 regardless of how it is created.   I also think NumPy 2.0 should take the opportunity to look hard at the API and what can be simplified (do we have the right collection of methods?).  I'm also a big fan of introducing a common "array of structure" object that has a smaller API footprint than Pandas but has indexing and group-by functionality. 

Fortunately, with the buffer protocol in Python, multiple array objects can easily co-exist in the Python ecosystem with no memory copies.   I think that is where we are headed and I don't see it as a bad thing.   I think agreeing on how to describe types would be very beneficial (it's an under-developed part of the buffer protocol).  This is exactly why we have made datashape an independent project that other projects can use as a data-type-description mini-language:  https://github.com/ContinuumIO/datashape

I think that a really good project for an enterprising young graduate student, post-doc, or professor (who is willing to delay their PhD or risk their tenure) would be to re-write the ufunc system using more modern techniques and put generalized ufuncs front and center as Nathaniel described.    

It sounds like many agree that we can improve the ufunc object implementation.    A new ufunc system is an entirely achievable goal and could even be shipped as an "add-on" project external from NumPy for several years before being adopted fully.    I know at least 4 people with demo-ware versions of a new ufunc-object that could easily replace current NumPy ufuncs eventually.    If you are interested in that, I would love to share what I know with you. 

After spending quite a bit of time thinking about this over the past 2 years, interacting with many in the user community outside of this list, and working with people as they explore a few options --- I do have a fair set of opinions.   But, there are also a lot of possibilities and many opportunities.  I'm looking forward to seeing what emerges in the coming months and years and cooperating where possible with others having overlapping interests.  

Best,

-Travis




On Tue, Jun 3, 2014 at 6:08 PM, Kyle Mandli <kyle.mandli@gmail.com> wrote:
Hello everyone,

As one of the co-chairs in charge of organizing the birds-of-a-feather sesssions at the SciPy conference this year, I wanted to solicit through the NumPy list to see if we could get enough interest to hold a NumPy centered BoF this year.  The BoF format would be up to those who would lead the discussion, a couple of ideas used in the past include picking out a few of the lead devs to be on a panel and have a Q&A type of session or an open Q&A with perhaps audience guided list of topics.  I can help facilitate organization of something but we would really like to get something organized this year (last year NumPy was the only major project that was not really represented in the BoF sessions).

Thanks!

Kyle Manldi (and via proxy Matt McCormick)



_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion




--

Travis Oliphant
CEO
Continuum Analytics, Inc.
http://www.continuum.io