Recap
The past year, has seen most of the "big picture" changes merged into NumPy, a good chunk already part of 1.20:
With the exception of universal functions, the above list covers all major areas of change in NumPy that are required to change. It also implements many of the things that new user DTypes will need and currently cannot do. Previously, these were either unavailable or limited in various ways; especially when it comes to parametric DTypes such as units or strings.
Currently in Progress
The current main reamining points are the universal functions. Since, a majority of NumPy features are organized as universal functions, and universal functions inheritently did not support parametric user defined DTypes. These need a major change. This change is proposed in
NEP 43 (although that will need some smaller updates).
The work on implemeting it, is mostly settling in the following PR and the following branch (I hope these will move in very soon):
In parallel, the new DType API is only useful for users once it is exposed, I have a branch here to experiment with that:
The exact way to write a new DType probably needs some alternative. But note that this should largely be limited to the boilerplate code.
Future
The main step still remaining is figuring out how to exactly expose the DType API best (ABI compatibility is the major concern) and finishing the NEP 43 (or most of it) as closing up.
After that there are still some things that need to be done (although, this is unlikely to be exhaustive):
- The way users should define new DTypes has to be decided (this seems tricky, unfortunately).
- Some functionality is defined in the "old style" API that should be removed/discouraged. This includes things like sorting functions. (The old way could be allowed for a transition period.) To be specific, these are the
((PyArray_Descr *)descr)->f->funcs
.
- Some small parts of the new API are missing right now. E.g.
ensure_nbo()
in current NumPy code, has to use the ensure_canonical()
as defined by NEP 42. Similarly, some parts will need tweaking.
- Part of the API should be public, but it would also be nice to clean them up before doing so; An example for this is the
get_loop()
for/of ufuncs. For most use-cases, this is probably not too important, but the API is a bit awkward currently. (It would be possible to accept the awkward API and replace it in the future with a new get_loop()
, deprecating the old one slowly)
- There should be some new API for "reference counting" (more generally, any item with memory management). Cleaning up the split between the current
transfer to NULL
and PyArray_XDECREF
. That is, we should unify it as much as possible (probably by using the transfer to NULL
path). And then expose that also to custom DTypes.
- Some utility functionality is missing at this time. For example a way for a Unit DType to fall back to the normal math implemented by NumPy (after figuring out the unit part).
- A Python API is not on my explicit roadmap right now (although probably not hard).
But most importantly, whatever comes up when potential users start exploring the API, hopefully soon!
Otherwise, there are a couple of related improvements, that I think would make sense. Such as considering storing the actual power-of-two alignment in the array flags (they are getting a bit cramped if we assume int
can be 16 bits though). Also the discussion about removing value based casting/promotion is one that would help with DTypes and pushing it forward probably makes sense as soon as the items that are "currently in progress" are largely settled and the next NumPy version is released.