data:image/s3,"s3://crabby-images/f2514/f251467d5273f243127cd9212e140e69542da73d" alt=""
Let's be clear, there are two very closely related things: recarrays and record arrays. Record arrays are just ndarrays with a complicated dtype. E.g.
In [1]: from numpy import *
In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)])) Out[2]: array([(1, 1.0), (1, 1.0), (1, 1.0)], dtype=[('foo', '<i4'), ('bar', '<f8')])
In [3]: r = _
In [4]: r['foo'] Out[4]: array([1, 1, 1])
recarray is a subclass of ndarray that just adds attribute access to record arrays.
In [10]: r2 = r.view(recarray)
In [11]: r2 Out[11]: recarray([(1, 1.0), (1, 1.0), (1, 1.0)], dtype=[('foo', '<i4'), ('bar', '<f8')])
In [12]: r2.foo Out[12]: array([1, 1, 1])
One downside of this is that the attribute access feature slows down all field accesses, even the r['foo'] form, because it sticks a bunch of pure Python code in the middle. Much code won't notice this, but if you end up having to iterate over an array of records (as I have), this will be a hotspot for you.
Record arrays are fundamentally a part of numpy, and no one is even suggesting that they would go away. No one is seriously suggesting that we should remove recarray, but some of us hesitate to recommend its use over plain record arrays.
Does that clarify the discussion for you?
Thanks! This has always been something that has confused me . . . This is awesome, I guess I build by DataFrame object for nothing :-) Gabriel