[Numpy-discussion] subclassing from np.ndarray and np.rec.recarray

Mon Jul 6 13:12:28 EDT 2009

Hi -- We are subclassing from np.rec.recarray and are confused about how
some methods of np.rec.recarray relate to (differ from) analogous methods of
its parent, np.ndarray.  Below are specific questions about the __eq__,
__getitem__ and view methods, we'd appreciate answers to our specific
questions and/or more general points that we may be not understanding about
subclassing from np.ndarray (and np.rec.recarray).

---

1) Suppose I have a recarray object, x. How come np.ndarray.__getitem__(x,
'column_name') returns a recarray object rather than a ndarray? e.g.,

In [230]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')], names=['a','b'])

In [231]: np.ndarray.__getitem__(x, 'a')
Out[231]: rec.array([1, 2])

In [232]: np.ndarray.__getitem__(x, 'a').dtype
Out[232]: dtype('int32')

The returned object is a recarray but it does not have a structured dtype.
This generally seems to be the case when passing the instance of a subclass
of np.ndarray (such as a np.rec.recarray object) to np.ndarray.__getitem__

---

2)a) When I use the __getitem__ method of recarray to get an individual
column, the returned object is an ndarray when the column is a numeric type
but it is a recarray when the column is a string type. Why doesn't
__getitem__ always return an ndarray for an individual column? e.g.,

In [175]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')], names=['a','b'])

In [176]: x['a']
Out[176]: array([1, 2])

In [177]: x['b']
Out[177]: rec.array(['dd', 'cc'], dtype='|S2')

2)b)  Suppose I have a subclass of recarray, NewRecarray, that attaches some
new attribute, e.g. 'info'.

x = NewRecarray(data, names = ['a','b'], formats = '<i4, |S2')

Now say I want to use recarray's __getitem__ method to get an individual
column.  Then

x['a'] is an ndarray
x['b'] is a NewRecarray and x['b'].info == x.info

Is this the expected / proper behavior?  Is there something wrong with the
way I've subclassed recarray?

---

3)a)  If I have two recarrays with the same len and column headers, the
__eq__ method returns the rich comparison.  Why is the result a recarray
rather than an ndarray?

In [162]: x = np.rec.fromrecords([(1,'dd'), (2,'cc')], names=['a','b'])
In [163]: y = np.rec.fromrecords([(1,'dd'), (2,'cc')], names=['a','b'])
In [164]: x == y
Out[164]: rec.array([ True,  True], dtype=bool)

3)b)  Suppose I have a subclass of recarray, NewRecarray, that attaches some
new attribute, e.g. 'info'.

x = NewRecarray(data)
y = NewRecarray(data)
z = x == y

Then z is a NewRecarray object and z.info = x.info.

Is this the expected / proper behavior?  Is there something wrong with the
way I've subclassed recarray?  [Dan Yamins asked this a couple days ago]

---

4)  Suppose I have a subclass of np.ndarray, NewArray, that attaches some
new attribute, e.g. 'info'. When I view a NewArray object as a ndarray, the
result has no 'info' attribute. Is the memory corresponding to the 'info'
attribute garbage collected? What happens to it?

x = NewArray(data)
x.view(np.ndarray)   has no 'info' attribute

---

Thanks for any help!  (And thanks for reading if you read any or all of
this!)

Elaine
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20090706/c627f378/attachment.html>