
On Thu, Jun 26, 2008 at 15:13, Dan Yamins <dyamins@gmail.com> wrote:
On Thu, Jun 26, 2008 at 3:34 PM, Gael Varoquaux <gael.varoquaux@normalesup.org> wrote:
On Thu, Jun 26, 2008 at 11:48:06AM -0500, John Hunter wrote:
I personally think they are the best thing since sliced bread, and everyone here who uses them becomes immediately addicted to them. I would like to see better support for them, especially making the attrs exposed to dir so tab completion would work.
People in the financial/business world work with spreadsheet data a lot, and record arrays are the natural data structure to represent tabular, heterogeneous data. If you work with this data all day, you save a lot of ugly keystrokes doing r.date rather than r['date'], and the code is prettier in my opinion.
I am +1 on all that.
I also completely second this. I use them all the time -- for finance data as well as biological/genomics data. It is essential for these applications to have spread-sheet like objects that can have mixed types and from which good numpy numerical arrays can be extracted when necessary. I hope to continue having access to them or something like them. I also hope that they will be better documented, since not only do I use them all the time, I'm hoping to teach their use to many more people whom I am training and in spread-sheet like data analysis.
(If they have some flaw I don't understand, it would be great if someone could explain it to me. And if there's something out there that fixes that flaw, I'd love to hear about it. But it seems to me at least that recarrays are very useful.)
Let's be clear, there are two very closely related things: recarrays and record arrays. Record arrays are just ndarrays with a complicated dtype. E.g. In [1]: from numpy import * In [2]: ones(3, dtype=dtype([('foo', int), ('bar', float)])) Out[2]: array([(1, 1.0), (1, 1.0), (1, 1.0)], dtype=[('foo', '<i4'), ('bar', '<f8')]) In [3]: r = _ In [4]: r['foo'] Out[4]: array([1, 1, 1]) recarray is a subclass of ndarray that just adds attribute access to record arrays. In [10]: r2 = r.view(recarray) In [11]: r2 Out[11]: recarray([(1, 1.0), (1, 1.0), (1, 1.0)], dtype=[('foo', '<i4'), ('bar', '<f8')]) In [12]: r2.foo Out[12]: array([1, 1, 1]) One downside of this is that the attribute access feature slows down all field accesses, even the r['foo'] form, because it sticks a bunch of pure Python code in the middle. Much code won't notice this, but if you end up having to iterate over an array of records (as I have), this will be a hotspot for you. Record arrays are fundamentally a part of numpy, and no one is even suggesting that they would go away. No one is seriously suggesting that we should remove recarray, but some of us hesitate to recommend its use over plain record arrays. Does that clarify the discussion for you? -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco