On Thu, Jun 5, 2014 at 6:40 AM, David Cournapeau <cournape@gmail.com> wrote:



On Thu, Jun 5, 2014 at 3:36 AM, Charles R Harris <charlesr.harris@gmail.com> wrote:



On Wed, Jun 4, 2014 at 7:29 PM, Travis Oliphant <travis@continuum.io> wrote:
Believe me, I'm all for incremental changes if it is actually possible and doesn't actually cost more.  It's also why I've been silent until now about anything we are doing being a candidate for a NumPy 2.0.  I understand the challenges of getting people to change.  But, features and solid improvements *will* get people to change --- especially if their new library can be used along with the old library and the transition can be done gradually. Python 3's struggle is the lack of features. 

At some point there *will* be a NumPy 2.0.   What features go into NumPy 2.0, how much backward compatibility is provided, and how much porting is needed to move your code from NumPy 1.X to NumPy 2.X is the real user question --- not whether it is characterized as "incremental" change or "re-write".     What I call a re-write and what you call an "incremental-change" are two points on a spectrum and likely overlap signficantly if we really compared what we are thinking about.      

One huge benefit that came out of the numeric / numarray / numpy transition that we mustn't forget about was actually the extended buffer protocol and memory view objects.  This really does allow multiple array objects to co-exist and libraries to use the object that they prefer in a way that did not exist when Numarray / numeric / numpy came out.    So, we shouldn't be afraid of that world.   The existence of easy package managers to update environments to try out new features and have applications on a single system that use multiple versions of the same library is also something that didn't exist before and that will make any transition easier for users.    

One thing I regret about my working on NumPy originally is that I didn't have the foresight, skill, and understanding to work more on a more extended and better designed multiple-dispatch system so that multiple array objects could participate together in an expression flow.   The __numpy_ufunc__ mechanism gives enough capability in that direction that it may be better now.  

Ultimately, I don't disagree that NumPy can continue to exist in "incremental" change mode ( though if you are swapping out whole swaths of C-code for Cython code --- it sounds a lot like a "re-write") as long as there are people willing to put the effort into changing it.   I think this is actually benefited by the existence of other array objects that are pushing the feature envelope without the constraints --- in much the same way that the Python standard library is benefitted by many versions of different capabilities being tried out before moving into the standard library. 

I remain optimistic that things will continue to improve in multiple ways --- if a little "messier" than any of us would conceive individually.   It *is* great to see all the PR's coming from multiple people on NumPy and all the new energy around improving things whether great or small.

@nathaniel IIRC, one of the objections to the missing values work was that it changed the underlying array object by adding a couple of variables to the structure. I'm willing to do that sort of thing, but it would be good to have general agreement that that is acceptable.


I think changing the ABI for some versions of numpy (2.0 , whatever) is acceptable. There is little doubt that the ABI will need to change to accommodate a better and more flexible architecture.

Changing the C API is more tricky: I am not up to date to the changes from the last 2-3 years, but at that time, most things could have been changed internally without breaking much, though I did not go far enough to estimate what the performance impact could be (if any).


 
As to blaze/dynd, I'd like to steal bits here and there, and maybe in the long term base numpy on top of it with a compatibility layer. There is a lot of thought and effort that has gone into those projects and we should use what we can. As is, I think numpy is good for another five to ten years and will probably hang on for fifteen, but it will be outdated by the end of that period. Like great whites, we need to keep swimming just to have oxygen. Software projects tend to be obligate ram ventilators.

The Python 3 experience is definitely something we want to avoid. And while blaze does big data and offers some nice features, I don't know that it offers compelling reasons to upgrade to the more ordinary user at this time, so I'd like to sort of slip it into numpy if possible.

If we do start moving numpy forward in more radical steps, we should try to have some agreement beforehand as to what sort of changes are acceptable. For instance, to maintain backward compatibility, is it sufficient that a recompile will do the job, or do we require forward compatibility for extensions compiled against earlier releases? Do we stay with C or should we support C++ code with its advantages of smart pointers, exception handling, and templates? We will need a certain amount of flexibility going forward and we should decide, or at least discuss, such issues up front.

Last time the C++ discussion was brought up, no consensus could be made. I think quite a few radical changes can be made without that consensus already, though other may disagree there.

IMO, what is needed the most is refactoring the internal to extract the Python C API low level from the rest of the code, as I think that's the main bottleneck to get more contributors (or get new core features more quickly).


What do you mean by "extract the Python C API"?

Chuck