Recap

The past year, has seen most of the "big picture" changes merged into NumPy, a good chunk already part of 1.20:

dtype instances are not instances of np.dtype subclasses. I usually write DType for those. But DTypeType is also a good name :).
Array coercion using np.array(...) was completely rewritten, which was necessary to allow new user DTypes.
Introduced the ArrayMethod concept to unif casting and ufuncs as much as possible (NEP 42/43):
- Casting was first fixed up to support error returns.
- "can-cast" logic was rewritten in terms of ArrayMethod (i.e. casting safety checks are integrated into Arraymethod)
- Casting largely reorganized around the ArrayMethod concept, including the casting safety. (Also this)
Promotion was implemented and later integrated everywhere, e.g. for np.result_type(...).
A larger refactor of UFuncs and a few smaller PRs set the stage for the ufunc refactor (see currently in progress)

With the exception of universal functions, the above list covers all major areas of change in NumPy that are required to change. It also implements many of the things that new user DTypes will need and currently cannot do. Previously, these were either unavailable or limited in various ways; especially when it comes to parametric DTypes such as units or strings.

Currently in Progress

The current main reamining points are the universal functions. Since, a majority of NumPy features are organized as universal functions, and universal functions inheritently did not support parametric user defined DTypes. These need a major change. This change is proposed in NEP 43 (although that will need some smaller updates).

The work on implemeting it, is mostly settling in the following PR and the following branch (I hope these will move in very soon):

PR 18905: Implements new promotion, dispatching and use for most ufuncs.
My developement branch extends this to the reductions.

In parallel, the new DType API is only useful for users once it is exposed, I have a branch here to experiment with that:

The expermental DType API exposure branch.
And a repository with (currently cython) examples using it. This currently includes a very simplicitic Units DType and ufuncs for strings (previous difficult or not really possible).

The exact way to write a new DType probably needs some alternative. But note that this should largely be limited to the boilerplate code.

Future

The main step still remaining is figuring out how to exactly expose the DType API best (ABI compatibility is the major concern) and finishing the NEP 43 (or most of it) as closing up.

After that there are still some things that need to be done (although, this is unlikely to be exhaustive):

The way users should define new DTypes has to be decided (this seems tricky, unfortunately).
Some functionality is defined in the "old style" API that should be removed/discouraged. This includes things like sorting functions. (The old way could be allowed for a transition period.) To be specific, these are the ((PyArray_Descr *)descr)->f->funcs.
Some small parts of the new API are missing right now. E.g. ensure_nbo() in current NumPy code, has to use the ensure_canonical() as defined by NEP 42. Similarly, some parts will need tweaking.
Part of the API should be public, but it would also be nice to clean them up before doing so; An example for this is the get_loop() for/of ufuncs. For most use-cases, this is probably not too important, but the API is a bit awkward currently. (It would be possible to accept the awkward API and replace it in the future with a new get_loop(), deprecating the old one slowly)
There should be some new API for "reference counting" (more generally, any item with memory management). Cleaning up the split between the current transfer to NULL and PyArray_XDECREF. That is, we should unify it as much as possible (probably by using the transfer to NULL path). And then expose that also to custom DTypes.
Some utility functionality is missing at this time. For example a way for a Unit DType to fall back to the normal math implemented by NumPy (after figuring out the unit part).
A Python API is not on my explicit roadmap right now (although probably not hard).

But most importantly, whatever comes up when potential users start exploring the API, hopefully soon!

Otherwise, there are a couple of related improvements, that I think would make sense. Such as considering storing the actual power-of-two alignment in the array flags (they are getting a bit cramped if we assume int can be 16 bits though). Also the discussion about removing value based casting/promotion is one that would help with DTypes and pushing it forward probably makes sense as soon as the items that are "currently in progress" are largely settled and the next NumPy version is released.