Thanks for the good summary Nathaniel.
Regarding dtype machinery, I agree casting is the hardest part. Unless the
code has changed dramatically, this was the main reason why you could not
make most of the dtypes separate from numpy codebase (I tried to move the
datetime dtype out of multiarray into a separate C extension some years
ago). Being able to separate the dtypes from the multiarray module would be
an obvious way to drive the internal API change.
Regarding the use of cython in numpy, was there any discussion about the
compilation/size cost of using cython, and talking to the cython team to
improve this ? Or was that considered acceptable with current cython for
numpy. I am convinced cleanly separating the low level parts from the
python C API plumbing would be the single most important thing one could do
to make the codebase more amenable.
David
On Tue, Aug 25, 2015 at 9:58 PM, Charles R Harris wrote: On Tue, Aug 25, 2015 at 1:00 PM, Travis Oliphant Thanks for the write-up Nathaniel. There is a lot of great detail and
interesting ideas here. I've am very eager to understand how to help NumPy and the wider
community move forward however I can (my passions on this have not changed
since 1999, though what I myself spend time on has changed). There are a lot of ways to think about approaching this, though. It's
hard to get all the ideas on the table, and it was unfortunate we couldn't
get everybody wyho are core NumPy devs together in person to have this
discussion as there are still a lot of questions unanswered and a lot of
thought that has gone into other approaches that was not brought up or
represented in the meeting (how does Numba fit into this, what about
data-shape, dynd, memory-views and Python type system, etc.). If NumPy
becomes just an interface-specification, then why don't we just do that
*outside* NumPy itself in a way that doesn't jeopardize the stability of
NumPy today. These are some of the real questions I have. I will try
to write up my thoughts in more depth soon, but I won't be able to respond
in-depth right now. I just wanted to comment because Nathaniel said I
disagree which is only partly true. The three most important things for me are 1) let's make sure we have
representation from as wide of the community as possible (this is really
hard), 2) let's look around at the broader community and the prior art that
is happening in this space right now and 3) let's not pretend we are going
to be able to make all this happen without breaking ABI compatibility.
Let's just break ABI compatibility with NumPy 2.0 *and* have as much
fidelity with the API and semantics of current NumPy as possible (though
there will be some changes necessary long-term). I don't think we should intentionally break ABI if we can avoid it, but I
also don't think we should spend in-ordinate amounts of time trying to
pretend that we won't break ABI (for at least some people), and most
importantly we should not pretend *not* to break the ABI when we actually
do. We did this once before with the roll-out of date-time, and it was
really un-necessary. When I released NumPy 1.0, there were several
things that I knew should be fixed very soon (NumPy was never designed to
not break ABI). Those problems are still there. Now, that we have
quite a bit better understanding of what NumPy *should* be (there have been
tremendous strides in understanding and community size over the past 10
years), let's actually make the infrastructure we think will last for the
next 20 years (instead of trying to shoe-horn new ideas into a 20-year old
code-base that wasn't designed for it). NumPy is a hard code-base. It has been since Numeric days in 1995. I
could be wrong, but my guess is that we will be passed by as a community if
we don't seize the opportunity to build something better than we can build
if we are forced to use a 20 year old code-base. It is more important to not break people's code and to be clear when a
re-compile is necessary for dependencies. Those to me are the most
important constraints. There are a lot of great ideas that we all have
about what we want NumPy to be able to do. Some of this are pretty
transformational (and the more exciting they are, the harder I think they
are going to be to implement without breaking at least the ABI). There
is probably some CAP-like theorem around
Stability-Features-Speed-of-Development (pick 2) when it comes to Open
Source Software development and making feature-progress with NumPy *is
going* to create in-stability which concerns me. I would like to see a little-bit-of-pain one time with a NumPy 2.0,
rather than a constant pain because of constant churn over many years
approach that Nathaniel seems to advocate. To me NumPy 2.0 is an
ABI-breaking release that is as API-compatible as possible and whose
semantics are not dramatically different. There are at least 3 areas of compatibility (ABI, API, and semantic).
ABI-compatibility is a non-feature in today's world. There are so many
distributions of the NumPy stack (and conda makes it trivial for anyone to
build their own or for you to build one yourself). Making less-optimal
software-engineering choices because of fear of breaking the ABI is not
something I'm supportive of at all. We should not break ABI every
release, but a release every 3 years that breaks ABI is not a problem. API compatibility should be much more sacrosanct, but it is also
something that can also be managed. Any NumPy 2.0 should definitely
support the full NumPy API (though there could be deprecated swaths). I
think the community has done well in using deprecation and limiting the
public API to make this more manageable and I would love to see a NumPy 2.0
that solidifies a future-oriented API along with a back-ward compatible API
that is also available. Semantic compatibility is the hardest. We have already broken this on
multiple occasions throughout the 1.x NumPy releases. Every time you
change the code, this can change. This is what I fear causing deep
instability over the course of many years. These are things like the
casting rule details, the effect of indexing changes, any change to the
calculations approaches. It is and has been the most at risk during any
code-changes. My view is that a NumPy 2.0 (with a new low-level
architecture) minimizes these changes to a single release rather than
unavoidably spreading them out over many, many releases. I think that summarizes my main concerns. I will write-up more forward
thinking ideas for what else is possible in the coming weeks. In the mean
time, thanks for keeping the discussion going. It is extremely exciting to
see the help people have continued to provide to maintain and improve
NumPy. It will be exciting to see what the next few years bring as
well. I think the only thing that looks even a little bit like a numpy 2.0 at
this time is dynd. Rewriting numpy, let alone producing numpy 2.0 is a
major project. Dynd is 2.5+ years old, 3500+ commits in, and still in
progress. If there is a decision to pursue Dynd I could support that, but
I think we would want to think deeply about how to make the transition as
painless as possible. It would be good at this point to get some feedback
from people currently using dynd. IIRC, part of the reason for starting
dynd was the perception that is was not possible to evolve numpy without
running into compatibility road blocks. Travis, could you perhaps summarize
the thinking that went into the decision to make dynd a separate project? <snip> Chuck _______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion