Re: [Numpy-discussion] Record arrays

June 26, 2008


      ...
Let's be clear, there are two very closely related things: recarrays
and record arrays. Record arrays are just ndarrays with a complicated
dtype. E.g.
In [1]: from numpy import *
In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)]))
Out[2]:
array([(1, 1.0), (1, 1.0), (1, 1.0)],
      dtype=[('foo', '<i4'), ('bar', '<f8')])
In [3]: r = _
In [4]: r['foo']
Out[4]: array([1, 1, 1])
recarray is a subclass of ndarray that just adds attribute access to
record arrays.
In [10]: r2 = r.view(recarray)
In [11]: r2
Out[11]:
recarray([(1, 1.0), (1, 1.0), (1, 1.0)],
      dtype=[('foo', '<i4'), ('bar', '<f8')])
In [12]: r2.foo
Out[12]: array([1, 1, 1])
One downside of this is that the attribute access feature slows down
all field accesses, even the r['foo'] form, because it sticks a bunch
of pure Python code in the middle. Much code won't notice this, but if
you end up having to iterate over an array of records (as I have),
this will be a hotspot for you.
Record arrays are fundamentally a part of numpy, and no one is even
suggesting that they would go away. No one is seriously suggesting
that we should remove recarray, but some of us hesitate to recommend
its use over plain record arrays.
Does that clarify the discussion for you?
Thanks! This has always been something that has confused me . . . This is
awesome, I guess I build by DataFrame object for nothing :-)

Gabriel

Re: [Numpy-discussion] Record arrays

Gabriel Gellner