[Numpy-discussion] DType Roadmap/NEP Discussion

Warren Weckesser warren.weckesser at gmail.com
Thu Sep 19 19:09:25 EDT 2019

On 9/18/19, Sebastian Berg <sebastian at sipsolutions.net> wrote:
> Hi all,
> to try and make some progress towards a decision since the broad design
> is pretty much settling from my side. I am thinking about making a
> meeting, and suggest Monday at 11am Pacific Time (I am open to other
> times though).

That works for me.


> My hope is to get everyone interested on board, so that we can make an
> informed decision about the general direction very soon. So just reach
> out, or discuss on the mailing list as well.
> The current draft for an NEP is here:
> https://hackmd.io/kxuh15QGSjueEKft5SaMug?both
> There are some design goals that I would like to clear up. I would
> prefer to avoid deep discussions of some specific issues, since I think
> the important decision right now is that my general start is in the
> right direction.
> It is not an easy topic, so my plan would be try and briefly summarize
> that and then hopefully clarify any questions and then we can discuss
> why alternatives are rejected. The most important thing is maybe
> gathering concerns which need to be clarified before we can go towards
> accepting the general design ideas.
> The main point of the NEP draft is actually captured by the picture in
> the linked document: DTypes are classes (such as Float64) and what is
> attached to the array is an instance of that class "<float64" or
> ">float64". Additionally, we would have AbstractDType classes which
> cannot be instantiated but define a type hierarchy.
> To list the main points:
> * DTypes are classes (corresponding to the current type number)
> * `arr.dtype` is an instances of its class, allowing to store
>   additional information such as a physical unit, the string length.
> * Most things are defined in special dtype slots similar to Pythons
>   type and number slots. They will be hidden and can be set through
>   an init function similar to `PyType_FromSpec` [1].
> * Promotion is defined primarily on the DType classes
> * Casting from one DType to another DType is defined by a new
>   CastingImpl object (should become a special ufunc)
>     - e.g. for strings, the CastingImpl is in charge of finding the
>       correct string length
> * The AbstractDType hierarchy will be used to decide the signature when
>   calling UFuncs.
> The main iffier points I can think of are:
> * NumPy currently uses value based promotion in some cases, which
>   requires special AbstractDTypes to describe (and some legacy
>   paths). (They are used use more like instances than typical classes)
> * Casting between flexible dtypes (such as strings) is a multi-step
>   process to figure out the actual output dtype.
>     - An example is: `np.can_cast("float64", "S3")` first finding
>       that `Float64->String` is possible in principle and then
>       asking the CastingImpl to find that `float64->S3` is not.
> * We have to break ABI compatibility in very minor, back-portable
>   way. More smaller incompatibilities are likely [2].
> * Since it is a major redesign, a lot of code has to be added/touched,
>   although it is possible to channel much of it back into the old
>   machinery.
> * A largish amount of new API around new DType type objects and also
>   DTypeMeta type objects, which users can (although usually do not have
>   to) subclass.
> However, most other designs will have similar issues. Basically, I
> currently really think this is "right", even if some details may end up
> a tricky.
> Best,
> Sebastian
> PS: The one thing outside the more general list above that I may want
> to discuss is how acceptable a global dict/mapping for dtype discovery
> during `np.array` coercion is (mapping python type -> dtype)...
> [1] https://docs.python.org/3/c-api/type.html#c.PyType_FromSpec
> [2] One possible issue may be "S0" which is normally used to denote
> what in the new API would be the `String` DType class.

More information about the NumPy-Discussion mailing list