[Numpy-discussion] Accepting NEP 42 — New and extensible DTypes
Sebastian Berg
sebastian at sipsolutions.net
Thu Oct 8 08:51:16 EDT 2020
Hi all,
after another thorough revision of NEP 42 (much thanks to Ben!), I
propose accepting the NEP, with the note that details are expected
change.
I am always happy to clarify and review the document based on feedback,
but I feel the important technical points should be very clear and
settled.
Exposing all of the proposed API may need additional detailed API
discussion. My focus is still a bit on the big picture design choices
that the NEP makes need to move forward and settle the implementation
internal to NumPy, although I am happy to discuss the details!
The title of the NEP is:
NEP 42 — New and extensible DTypes
And available at:
https://numpy.org/neps/nep-0042-new-dtypes.html
While enabling new user-defined DTypes is the main goal, the main work
is the internal restructure of NumPy's own DTypes necessary to allow
that.
I have pasted the "Abstract" and "Motivation and scope" section below,
which give a good overview of the issues and we are trying to address.
It is followed by the "Usage and impact" section which gives a big-
picture overview of the design.
I will refer to the full NEP for more detailed technical decisions and
explanations.
Cheers,
Sebastian
PS: In some places NEP 42 references NEP 43, for which I hope to merge
the draft soon, the current status is here:
https://github.com/numpy/numpy/pull/16723
However, this should be mainly interested for those wishing to go into
more technical details.
***********************************************************************
*******
Abstract
***********************************************************************
*******
NumPy's dtype architecture is monolithic -- each dtype is an instance
of a
single class. There's no principled way to expand it for new dtypes,
and the
code is difficult to read and maintain.
As :ref:`NEP 41 <NEP41>` explains, we are proposing a new architecture
that is
modular and open to user additions. dtypes will derive from a new
``DType``
class serving as the extension point for new types.
``np.dtype("float64")``
will return an instance of a ``Float64`` class, a subclass of root
class
``np.dtype``.
This NEP is one of two that lay out the design and API of this new
architecture. This NEP addresses dtype implementation; NEP 43 addresses
universal functions.
.. note::
Details of the private and external APIs may change to reflect user
comments and implementation constraints. The underlying principles
and
choices should not change significantly.
***********************************************************************
*******
Motivation and scope
***********************************************************************
*******
Our goal is to allow user code to create fully featured dtypes for a
broad
variety of uses, from physical units (such as meters) to domain-
specific
representations of geometric objects. :ref:`NEP 41 <NEP41>` describes a
number
of these new dtypes and their benefits.
Any design supporting dtypes must consider:
- How shape and dtype are determined when an array is created
- How array elements are stored and accessed
- The rules for casting dtypes to other dtypes
In addition:
- We want dtypes to comprise a class hierarchy open to new types and to
subhierarchies, as motivated in :ref:`NEP 41 <NEP41>`.
And to provide this,
- We need to define a user API.
All these are the subjects of this NEP.
- The class hierarchy, its relation to the Python scalar types, and its
important attributes are described in `nep42_DType class`_.
- The functionality that will support dtype casting is described in
`Casting`_.
- The implementation of item access and storage, and the way shape and
dtype
are determined when creating an array, are described in
:ref:`nep42_array_coercion`.
- The functionality for users to define their own DTypes is described
in
`Public C-API`_.
The API here and in NEP 43 is entirely on the C side. A Python-side
version
will be proposed in a future NEP. A future Python API is expected to be
similar, but provide a more convenient API to reuse the functionality
of
existing DTypes. It could also provide shorthands to create structured
DTypes
similar to Python's
`dataclasses <https://docs.python.org/3.8/library/dataclasses.html>`_.
***********************************************************************
*******
Usage and impact
***********************************************************************
*******
We believe the few structures in this section are sufficient to
consolidate
NumPy's present functionality and also to support complex user-defined
DTypes.
The rest of the NEP fills in details and provides support for the
claim.
Again, though Python is used for illustration, the implementation is a
C API only; a
future NEP will tackle the Python API.
After implementing this NEP, creating a DType will be possible by
implementing
the following outlined DType base class,
that is further described in `nep42_DType class`_:
class DType(np.dtype):
type : type # Python scalar type
parametric : bool # (may be indicated by superclass)
@property
def canonical(self) -> bool:
raise NotImplementedError
def ensure_canonical(self : DType) -> DType:
raise NotImplementedError
For casting, a large part of the functionality is provided by the
"methods" stored
in ``_castingimpl``
@classmethod
def common_dtype(cls : DTypeMeta, other : DTypeMeta) ->
DTypeMeta:
raise NotImplementedError
def common_instance(self : DType, other : DType) -> DType:
raise NotImplementedError
# A mapping of "methods" each detailing how to cast to another
DType
# (further specified at the end of the section)
_castingimpl = {}
For array-coercion, also part of casting:
def __dtype_setitem__(self, item_pointer, value):
raise NotImplementedError
def __dtype_getitem__(self, item_pointer, base_obj) -> object:
raise NotImplementedError
@classmethod
def __discover_descr_from_pyobject__(cls, obj : object) ->
DType:
raise NotImplementedError
# initially private:
@classmethod
def _known_scalar_type(cls, obj : object) -> bool:
raise NotImplementedError
Other elements of the casting implementation is the ``CastingImpl``:
casting = Union["safe", "same_kind", "unsafe"]
class CastingImpl:
# Object describing and performing the cast
casting : casting
def resolve_descriptors(self, Tuple[DType] : input) ->
(casting, Tuple[DType]):
raise NotImplementedError
# initially private:
def _get_loop(...) -> lowlevel_C_loop:
raise NotImplementedError
which describes the casting from one DType to another. In
NEP 43 this ``CastingImpl`` object is used unchanged to
support universal functions.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20201008/ffda6793/attachment.sig>
More information about the NumPy-Discussion
mailing list