On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant <travis@continuum.io> wrote:

Thanks for the write-up Nathaniel. There is a lot of great detail and interesting ideas here.

I've am very eager to understand how to help NumPy and the wider community move forward however I can (my passions on this have not changed since 1999, though what I myself spend time on has changed).

There are a lot of ways to think about approaching this, though. It's hard to get all the ideas on the table, and it was unfortunate we couldn't get everybody wyho are core NumPy devs together in person to have this discussion as there are still a lot of questions unanswered and a lot of thought that has gone into other approaches that was not brought up or represented in the meeting (how does Numba fit into this, what about data-shape, dynd, memory-views and Python type system, etc.). If NumPy becomes just an interface-specification, then why don't we just do that *outside* NumPy itself in a way that doesn't jeopardize the stability of NumPy today. These are some of the real questions I have. I will try to write up my thoughts in more depth soon, but I won't be able to respond in-depth right now. I just wanted to comment because Nathaniel said I disagree which is only partly true.

The three most important things for me are 1) let's make sure we have representation from as wide of the community as possible (this is really hard), 2) let's look around at the broader community and the prior art that is happening in this space right now and 3) let's not pretend we are going to be able to make all this happen without breaking ABI compatibility. Let's just break ABI compatibility with NumPy 2.0 *and* have as much fidelity with the API and semantics of current NumPy as possible (though there will be some changes necessary long-term).

I don't think we should intentionally break ABI if we can avoid it, but I also don't think we should spend in-ordinate amounts of time trying to pretend that we won't break ABI (for at least some people), and most importantly we should not pretend *not* to break the ABI when we actually do. We did this once before with the roll-out of date-time, and it was really un-necessary. When I released NumPy 1.0, there were several things that I knew should be fixed very soon (NumPy was never designed to not break ABI). Those problems are still there. Now, that we have quite a bit better understanding of what NumPy *should* be (there have been tremendous strides in understanding and community size over the past 10 years), let's actually make the infrastructure we think will last for the next 20 years (instead of trying to shoe-horn new ideas into a 20-year old code-base that wasn't designed for it).

NumPy is a hard code-base. It has been since Numeric days in 1995. I could be wrong, but my guess is that we will be passed by as a community if we don't seize the opportunity to build something better than we can build if we are forced to use a 20 year old code-base.

It is more important to not break people's code and to be clear when a re-compile is necessary for dependencies. Those to me are the most important constraints. There are a lot of great ideas that we all have about what we want NumPy to be able to do. Some of this are pretty transformational (and the more exciting they are, the harder I think they are going to be to implement without breaking at least the ABI). There is probably some CAP-like theorem around Stability-Features-Speed-of-Development (pick 2) when it comes to Open Source Software development and making feature-progress with NumPy *is going* to create in-stability which concerns me.

I would like to see a little-bit-of-pain one time with a NumPy 2.0, rather than a constant pain because of constant churn over many years approach that Nathaniel seems to advocate. To me NumPy 2.0 is an ABI-breaking release that is as API-compatible as possible and whose semantics are not dramatically different.

There are at least 3 areas of compatibility (ABI, API, and semantic). ABI-compatibility is a non-feature in today's world. There are so many distributions of the NumPy stack (and conda makes it trivial for anyone to build their own or for you to build one yourself). Making less-optimal software-engineering choices because of fear of breaking the ABI is not something I'm supportive of at all. We should not break ABI every release, but a release every 3 years that breaks ABI is not a problem.

API compatibility should be much more sacrosanct, but it is also something that can also be managed. Any NumPy 2.0 should definitely support the full NumPy API (though there could be deprecated swaths). I think the community has done well in using deprecation and limiting the public API to make this more manageable and I would love to see a NumPy 2.0 that solidifies a future-oriented API along with a back-ward compatible API that is also available.

Semantic compatibility is the hardest. We have already broken this on multiple occasions throughout the 1.x NumPy releases. Every time you change the code, this can change. This is what I fear causing deep instability over the course of many years. These are things like the casting rule details, the effect of indexing changes, any change to the calculations approaches. It is and has been the most at risk during any code-changes. My view is that a NumPy 2.0 (with a new low-level architecture) minimizes these changes to a single release rather than unavoidably spreading them out over many, many releases.

I think that summarizes my main concerns. I will write-up more forward thinking ideas for what else is possible in the coming weeks. In the mean time, thanks for keeping the discussion going. It is extremely exciting to see the help people have continued to provide to maintain and improve NumPy. It will be exciting to see what the next few years bring as well.

I think the only thing that looks even a little bit like a numpy 2.0 at this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a major project. Dynd is 2.5+ years old, 3500+ commits in, and still in progress. If there is a decision to pursue Dynd I could support that, but I think we would want to think deeply about how to make the transition as painless as possible. It would be good at this point to get some feedback from people currently using dynd. IIRC, part of the reason for starting dynd was the perception that is was not possible to evolve numpy without running into compatibility road blocks. Travis, could you perhaps summarize the thinking that went into the decision to make dynd a separate project?

<snip>

Chuck