subclassing ndarray and keeping same ufunc behavior

I'm trying to subclass an ndarray so that I can add some additional fields. When I do this however, I get new odd behavior when my object is passed to a variety of numpy functions. For example nanmin returns now return an object of the type of my new array class, whereas previously I'd get a float64. Why? Is this a bug with nanmin or my class? import numpy as np class NDArrayWithColumns(np.ndarray): def __new__(cls, obj, columns=None): obj = obj.view(cls) obj.columns = tuple(columns) return obj def __array_finalize__(self, obj): if obj is None: return self.columns = getattr(obj, 'columns', None) NAN = float("nan") r = np.array([1.,0.,1.,0.,1.,0.,1.,0.,NAN, 1., 1.])print "MIN", np.nanmin(r), type(np.nanmin(r)) gives: MIN 0.0 <type 'numpy.float64'> but
r = NDArrayWithColumns(r, ["a"])>>> print "MIN", np.nanmin(r), type(np.nanmin(r)) MIN 0.0 <class '__main__.NDArrayWithColumns'>>>> print r.shape # ?!(11,)
Note the change in type, and also that str(np.nanmin(r)) shows 1 field, not 11 as indicated by its shape. This seems wrong. Is there a way to get my subclass to behave more like an ndarray? I realize from the docs that I can override __array_wrap__, but its not clear me how how to use it to solve this issue. Or whether its the right tool. In case you're interested, I'm subclassing because I'd like to track column names in matrices of a single type. This is pretty common wish in scikit pipelines. Structured arrays and record type arrays allow for varying type. Pandas provides this functionality, but dealing with numpy arrays is easier (and more efficient) when writing cython extensions. Also, I think the structured arrays and record types are unlikely to play nice with cython because they're more freely typed -- I want to deal exclusively with arrays of doubles. Any thoughts of how to subclass ndarray and keep original behavior in ufuncs?

Hi Stuart, It certainly seems correct behaviour to return the subclass you created: after all, you might want to keep the information on `columns` (e.g., consider doing nanmin along a given axis). Indeed, we certainly want to keep the unit in astropy's Quantity (which also is a subclass of ndarray). On the shape: shouldn't that be print(np.nanmin(r).shape)?? Overall, I think it is worth considering very carefully what exactly you try to accomplish; if different elements along a given axis have different meaning, I'm not sure it makes all that much sense to treat them as a single array (e.g., np.sin might be useful for one column, not not another). Even if pandas is slower, the advantage in clarity of what is happening might well be more important in the long run. All the best, Marten p.s. nanmin is not a ufunc; you can find it in numpy/lib/nan_functions.py

You might also want to consider writing a wrapper object that contains an ndarray as a (possibly private) attribute and then presents different views or interpretations of that array. Subclassing ndarray is a pit of snakes, it's best to avoid it if you can (I say as the author and maintainer of an ndarray subclass). On Tue, Nov 15, 2016 at 1:48 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:

Hi Stuart, It certainly seems correct behaviour to return the subclass you created: after all, you might want to keep the information on `columns` (e.g., consider doing nanmin along a given axis). Indeed, we certainly want to keep the unit in astropy's Quantity (which also is a subclass of ndarray). On the shape: shouldn't that be print(np.nanmin(r).shape)?? Overall, I think it is worth considering very carefully what exactly you try to accomplish; if different elements along a given axis have different meaning, I'm not sure it makes all that much sense to treat them as a single array (e.g., np.sin might be useful for one column, not not another). Even if pandas is slower, the advantage in clarity of what is happening might well be more important in the long run. All the best, Marten p.s. nanmin is not a ufunc; you can find it in numpy/lib/nan_functions.py

You might also want to consider writing a wrapper object that contains an ndarray as a (possibly private) attribute and then presents different views or interpretations of that array. Subclassing ndarray is a pit of snakes, it's best to avoid it if you can (I say as the author and maintainer of an ndarray subclass). On Tue, Nov 15, 2016 at 1:48 PM, Marten van Kerkwijk < m.h.vankerkwijk@gmail.com> wrote:
participants (3)
-
Marten van Kerkwijk
-
Nathan Goldbaum
-
Stuart Reynolds