[Numpy-discussion] Subclassing vs. dispatch

Fri Jan 15 18:10:06 EST 2021

On Fri, 2021-01-15 at 18:38 +0000, Israel, Daniel M wrote:
> I hope this is the right place to post this.
> 
> The numpy documentation talks about two methods for making ndarray-
> like objects, subclassing and dispatching, but it is not clear to me
> which one is most appropriate for which purpose.  Can someone
> provide, or point me to, some guidance, about this?  I’m particularly
> interested in what happens if there are multiple layers of
> subclassing.  Can you subclass from a subclass?  Dispatch from a
> dispatch?  Subclass from a dispatch and vice versa?

All of those things can be made to work with appropriate use of
`super()`. Subclassing and dispatching are not exclusive (an example is
astropy.quantitile`).

If you want to go well beyond typical NumPy behaviour, I would suggest
to focus on dispatching. If all you want is to add a single method,
subclassing should be a pretty good fit. (Assuming you don't mind if
some operations may end up giving you a normal array, or return your
array when a normal array would fit better.)

For example, MaskedArray in NumPy is a subclass, but adds so much
additional things that dispatching without subclassing is likely a
better fit. (Opinions will probably differ; I expect using subclassing
some things will "just work". However, sometimes the things that "just
work" may also do the wrong thing).  Ignoring the mask of a MaskedArray
is always a serious issues.

> My specific application is a pair of classes, SpectralArray and
> PhysicalArray that uses numpy.fft to provides a to_physical() and
> to_spectral() method, respectively, to simplify writing pseudo-
> spectral codes.  Initially this will be serial, but the
> implementation will eventually use a mechanism similar to mpi4py-fft
> to allow the arrays to be distributed.  Further, it would be nice to
> be able to make the code interoperable with the cupy CUDA numpy
> implementation, so that the sub array on each MPI process could use
> GPU accelerated FFTs.

It sounds like you mostly want to add a set of method, so making a
MixIn class and using subclassing may well be a good option.  You can
still add `__array_function__` or `__array_ufunc__` with a fallback to
`super()` to override specific functions.

If there is more to it (e.g. metadata for frequency scales or similar),
it may be better to skip subclassing altogether. (Just to mention: in
such a case `xarray` may be interesting.)

Since you are also looking for distributed arrays, you should probably
look into Dask (I do not know `mpyi4py` though). Dask arrays consist of
distributed NumPy or CuPy arrays and make use of the dispatching in
NumPy.
Note that NumPy arrays cannot be distributed or gpu backed, and you
cannot add using a subclass. So if that is the aim, do not subclass
ndarray unless you were prepared to create multiple (sub)classes
(ndarray, dask array, cupy array).

Cheers,

Sebastian

> 
> Advice?  Thanks.
> 
> —
> Daniel M. Israel, Ph. D.
> XCP-4: Methods & Algorithms
> Mailstop F644
> Los Alamos National Laboratory
> 505 665 5664<tel:505%20665%205664>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: This is a digitally signed message part
URL: <https://mail.python.org/pipermail/numpy-discussion/attachments/20210115/7f4a5087/attachment.sig>