Accepting NEP 42 — New and extensible DTypes

Hi all, after another thorough revision of NEP 42 (much thanks to Ben!), I propose accepting the NEP, with the note that details are expected change. I am always happy to clarify and review the document based on feedback, but I feel the important technical points should be very clear and settled. Exposing all of the proposed API may need additional detailed API discussion. My focus is still a bit on the big picture design choices that the NEP makes need to move forward and settle the implementation internal to NumPy, although I am happy to discuss the details! The title of the NEP is: NEP 42 — New and extensible DTypes And available at: https://numpy.org/neps/nep-0042-new-dtypes.html While enabling new user-defined DTypes is the main goal, the main work is the internal restructure of NumPy's own DTypes necessary to allow that. I have pasted the "Abstract" and "Motivation and scope" section below, which give a good overview of the issues and we are trying to address. It is followed by the "Usage and impact" section which gives a big- picture overview of the design. I will refer to the full NEP for more detailed technical decisions and explanations. Cheers, Sebastian PS: In some places NEP 42 references NEP 43, for which I hope to merge the draft soon, the current status is here: https://github.com/numpy/numpy/pull/16723 However, this should be mainly interested for those wishing to go into more technical details. *********************************************************************** ******* Abstract *********************************************************************** ******* NumPy's dtype architecture is monolithic -- each dtype is an instance of a single class. There's no principled way to expand it for new dtypes, and the code is difficult to read and maintain. As :ref:`NEP 41 <NEP41>` explains, we are proposing a new architecture that is modular and open to user additions. dtypes will derive from a new ``DType`` class serving as the extension point for new types. ``np.dtype("float64")`` will return an instance of a ``Float64`` class, a subclass of root class ``np.dtype``. This NEP is one of two that lay out the design and API of this new architecture. This NEP addresses dtype implementation; NEP 43 addresses universal functions. .. note:: Details of the private and external APIs may change to reflect user comments and implementation constraints. The underlying principles and choices should not change significantly. *********************************************************************** ******* Motivation and scope *********************************************************************** ******* Our goal is to allow user code to create fully featured dtypes for a broad variety of uses, from physical units (such as meters) to domain- specific representations of geometric objects. :ref:`NEP 41 <NEP41>` describes a number of these new dtypes and their benefits. Any design supporting dtypes must consider: - How shape and dtype are determined when an array is created - How array elements are stored and accessed - The rules for casting dtypes to other dtypes In addition: - We want dtypes to comprise a class hierarchy open to new types and to subhierarchies, as motivated in :ref:`NEP 41 <NEP41>`. And to provide this, - We need to define a user API. All these are the subjects of this NEP. - The class hierarchy, its relation to the Python scalar types, and its important attributes are described in `nep42_DType class`_. - The functionality that will support dtype casting is described in `Casting`_. - The implementation of item access and storage, and the way shape and dtype are determined when creating an array, are described in :ref:`nep42_array_coercion`. - The functionality for users to define their own DTypes is described in `Public C-API`_. The API here and in NEP 43 is entirely on the C side. A Python-side version will be proposed in a future NEP. A future Python API is expected to be similar, but provide a more convenient API to reuse the functionality of existing DTypes. It could also provide shorthands to create structured DTypes similar to Python's `dataclasses <https://docs.python.org/3.8/library/dataclasses.html>`_. *********************************************************************** ******* Usage and impact *********************************************************************** ******* We believe the few structures in this section are sufficient to consolidate NumPy's present functionality and also to support complex user-defined DTypes. The rest of the NEP fills in details and provides support for the claim. Again, though Python is used for illustration, the implementation is a C API only; a future NEP will tackle the Python API. After implementing this NEP, creating a DType will be possible by implementing the following outlined DType base class, that is further described in `nep42_DType class`_: class DType(np.dtype): type : type # Python scalar type parametric : bool # (may be indicated by superclass) @property def canonical(self) -> bool: raise NotImplementedError def ensure_canonical(self : DType) -> DType: raise NotImplementedError For casting, a large part of the functionality is provided by the "methods" stored in ``_castingimpl`` @classmethod def common_dtype(cls : DTypeMeta, other : DTypeMeta) -> DTypeMeta: raise NotImplementedError def common_instance(self : DType, other : DType) -> DType: raise NotImplementedError # A mapping of "methods" each detailing how to cast to another DType # (further specified at the end of the section) _castingimpl = {} For array-coercion, also part of casting: def __dtype_setitem__(self, item_pointer, value): raise NotImplementedError def __dtype_getitem__(self, item_pointer, base_obj) -> object: raise NotImplementedError @classmethod def __discover_descr_from_pyobject__(cls, obj : object) -> DType: raise NotImplementedError # initially private: @classmethod def _known_scalar_type(cls, obj : object) -> bool: raise NotImplementedError Other elements of the casting implementation is the ``CastingImpl``: casting = Union["safe", "same_kind", "unsafe"] class CastingImpl: # Object describing and performing the cast casting : casting def resolve_descriptors(self, Tuple[DType] : input) -> (casting, Tuple[DType]): raise NotImplementedError # initially private: def _get_loop(...) -> lowlevel_C_loop: raise NotImplementedError which describes the casting from one DType to another. In NEP 43 this ``CastingImpl`` object is used unchanged to support universal functions.

Hi all, On Thu, 2020-10-08 at 07:51 -0500, Sebastian Berg wrote:
This has been a while ago, and a draft for NEP 43 (UFunc redesign) is now available at: https://numpy.org/neps/nep-0043-extensible-ufuncs.html I would appreciate any feedback and am happy to go into more details where necessary. Do we have a consensus about the general big picture API design or are there any concerns? These documents outline (most importantly): 1. How DTypes should be created (NEP 42) 2. How Casting will be implemented (NEP 42) 3. How UFuncs will be redesigned: (NEP 43) * This changes the calling convention * It also unifies casting largely with ufuncs 4. How ufunc promotion will be handled in the future: (NEP 43) * This is what happens when you add mixed types, for example float64 + int32 casts int32 to float64 and uses the float64 + float64 implementation. Point 1. is finished to the extend currently necessary. Right now I am basically finishing with Casting (point 2). And I expect it to move forward very soon at least in part. This does have a big overlap with UFuncs (point 3), though. So if you are interested in that, it is a good time to dive in, even if many details can still be changed easily for a while! Cheers, Sebastian

Hi all, On Thu, 2020-10-08 at 07:51 -0500, Sebastian Berg wrote:
This has been a while ago, and a draft for NEP 43 (UFunc redesign) is now available at: https://numpy.org/neps/nep-0043-extensible-ufuncs.html I would appreciate any feedback and am happy to go into more details where necessary. Do we have a consensus about the general big picture API design or are there any concerns? These documents outline (most importantly): 1. How DTypes should be created (NEP 42) 2. How Casting will be implemented (NEP 42) 3. How UFuncs will be redesigned: (NEP 43) * This changes the calling convention * It also unifies casting largely with ufuncs 4. How ufunc promotion will be handled in the future: (NEP 43) * This is what happens when you add mixed types, for example float64 + int32 casts int32 to float64 and uses the float64 + float64 implementation. Point 1. is finished to the extend currently necessary. Right now I am basically finishing with Casting (point 2). And I expect it to move forward very soon at least in part. This does have a big overlap with UFuncs (point 3), though. So if you are interested in that, it is a good time to dive in, even if many details can still be changed easily for a while! Cheers, Sebastian
participants (1)
-
Sebastian Berg