[Numpy-discussion] we held an impromptu dtype brainstorming sesison at SciPy

Nathaniel Smith njs at pobox.com
Sun Jul 15 03:23:16 EDT 2018


A few quick things that jumped out at me:

- I think the discussion of __array_ufunc__ on dtypes is confused.
Dtypes already have a generic mechanism for overriding ufuncs (the
ufunc loop dispatch mechanism), it's separate from __array_ufunc__ (so
in theory you could e.g. have a user-defined array class that uses
__array_ufunc__ and still handles arbitrary user-defined dtypes), and
it's more powerful (e.g. can do automatic casting to dtypes that don't
even appear in the input arrays).

- IMO we should just not support string parsing for new dtypes. The
right way to pass structured data in Python is with a structured
object, not a string. Language design creates tons of problems and we
can avoid all of them if we just don't do it.

- In Python, subclassing and nominal types (as opposed to duck types)
are both code smells in general, but this is the rare case where we
actually do want them. For dtypes defined in C (including all the
current built-in ones!), we want to be able to call their special
dtype methods directly from C, without jumping back out to Python.
Fortunately, we don't have to invent anything new here -- all of
Python's built-in special methods have the same issue, and they solve
it with what they call "slots".

For example, when you define a new type in C and want to give it an
__add__ method, you don't create a Python callable and stick it in the
type's __dict__ and expect PyNumber_Add to find it by doing a dict
lookup and boxing up the arguments in a tuple and all that. Instead,
you fill in the nb_add slot in the C-level type object, and
PyNumber_Add calls that directly. (And there's also some
fancy-footwork where if you fill in the nb_add slot, then type.__new__
will automatically create a Python-level wrapper and stick it in the
__add__ slot in the type dict; or if you're defining a new type in
Python, then type.__new__ will notice if you define a Python-level
__add__ method and automatically create a C-level wrapper and stick it
in the nb_add slot. End result: Python and C callers can both blindly
invoke the method using either the Python or C level mechanism, and in
all cases it automatically does the most efficient thing.)

So for dtypes we want our own slots. This is conceptually
straightforward but has a few moving parts you need to line up: to add
new slots, you have to add new entries to the PyType struct. You do
this the same way you extend any Python object: you subclass PyType,
i.e., define a metaclass. Then you make np.dtype an instance of this
new metaclass, so that np.dtype and all np.dtype subclasses
automatically have the extra slots available in their type object. And
then you hook up some plumbing to make sure that the slots are set up
correctly (in your metaclass's __new__ method), etc.

That said, we should ideally try to make np.dtype a kind of abstract
base class with as little as possible logic on the actual class,
because being an instance of np.dtype should mean "this object
implements these Python and C-level interfaces", not actually trigger
behavioral inheritance. Maybe we can move most of the stuff that's
currently there into an internal 'legacy_dtype' class? Or maybe we'll
just have to grit our teeth and live with a not-quite-ideal design.

- OTOH, re: "mixins for units" -- just don't go there! make units a
wrapper dtype that has-a underlying dtype, where the units class's
methods can invoke the wrapped dtype's methods at appropriate times.

On Sat, Jul 14, 2018 at 1:39 PM, Matti Picus <matti.picus at gmail.com> wrote:
> The stars all aligned properly and some of the steering committee suggested
> we put together a quick brainstorming session over what to do with dtypes.
> About 20 people joined in the discussion which was very productive. We began
> with user stories and design requirements, and asked some present to spend 5
> minutes and create a straw-man implementation of what their dream dtype
> implementation would contain. The resulting document
> https://github.com/numpy/numpy/wiki/Dtype-Brainstorming will serve as the
> basis for a future NEP and more work toward a better, user-extensible dtype.
>
> More comments are welcome, the discussion is only at the beginning stages.
>
> Matti
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion



-- 
Nathaniel J. Smith -- https://vorpus.org


More information about the NumPy-Discussion mailing list