[Numpy-discussion] Advice on converting iterator into array efficiently

Thu Aug 28 20:57:09 EDT 2008

Looking for advice on a good way to handle this problem.

I'm dealing with large tables (Gigabyte large). I would like to 
efficiently subset values from one column based on the values in
another column, and get arrays out of the operation. For example,
say I have 2 columns, "energy" and "collection". Collection is
basically an index that flags values that go together, so all the
energy values with a collection value of 18 belong together. I'd
like to be able to set up an iterator on collection that would
hand me an array of energy on each iteration :

if table is all my data, then something like

for c in table['collection'] :
    e = c['energy']
    ... do array operations on e

I've been playing with pytables, and they help, but I can't quite
seem to get there. I can get an iterator for energy within a collection,
but I can't figure out an efficient way to get an array out.

What I have so far is 

for h in np.unique(table.col('collection')) :
    rows = table.where('collection == c')
    for row in rows :
        print c,' : ', row['energy']

but I really want to convert rows['energy'] to an array.

I've thought about building a nasty set of pointers and whatnot -
I did it once in perl - but I'm hoping to avoid that.

-- 
-----------------------------------------------------------------------
| Alan K. Jackson            | To see a World in a Grain of Sand      |
| alan at ajackson.org          | And a Heaven in a Wild Flower,         |
| www.ajackson.org           | Hold Infinity in the palm of your hand |
| Houston, Texas             | And Eternity in an hour. - Blake       |
-----------------------------------------------------------------------