[Numpy-discussion] subclassing ndarray subtleties??

Nathaniel Smith njs at pobox.com
Tue May 22 12:20:48 EDT 2012

On Mon, May 21, 2012 at 6:47 PM, Tom Aldcroft
<aldcroft at head.cfa.harvard.edu> wrote:
> Over on the scipy-user mailing list there was a question about
> subclassing ndarray and I was interested to see two responses that
> seemed to imply that subclassing should be avoided.
> >From Dag and Nathaniel, respectively:
> "Subclassing ndarray is a very tricky business -- I did it once and
> regretted having done it for years, because there's so much you can't
> do etc.. You're almost certainly better off with embedding an array as
> an attribute, and then forward properties etc. to it."
> "Yes, it's almost always the wrong thing..."
> So my question is whether there are subtleties or issues that are not
> covered in the standard NumPy documents on subclassing ndarray.  What
> are the "things you can't do etc"?  I'm working on a project that
> relies heavily on an ndarray subclass which just adds a few attributes
> and slightly tweaks __getitem__.  It seems fine and I really like that
> the class is an ndarray with all the built-in methods already there.
> Am I missing anything?
> >From the scipy thread I did already learn that one should also
> override __getslice__ in addition to __getitem__ to be safe.

I don't know of anything that the docs are lacking in particular. It's
just that subclassing in general is basically a special form of
monkey-patching: you have this ecosystem of cooperating methods, and
then you're inserting some arbitrary changes in the middle of it.
Making it all work in general requires that you carefully think
through how all the different pieces of the ndarray API interact, and
the ndarray API is very large and complicated. The __getslice__ thing
is one example of this. For another: does your __getitem__ properly
handle *all* the cases that regular ndarray.__getitem__ handles? (I'm
not sure anyone actually knows what this complete list is, there are a
lot of potential corner cases.) What happens if one of your objects is
passed to third-party code that uses __getitem__? What happens if your
array is accidentally stripped of its magic properties by passing
through np.asarray() at the top of some function? Have you thought
about how your special attributes are affected by, say, swapaxes? Have
you applied your tweaks to item() and setitem()?

I'm just guessing randomly here of course, since I have no idea what
you've done. And I've subclassed ndarray myself at least three times,
for reasons that seemed good enough at the time, so I'm not saying
it's never doable. It's just that there are tons of these tiny little
details, any one of which can trip you up, and that means that people
tend to dive in and then discover the pitfalls later.

- N

More information about the NumPy-Discussion mailing list