On 12/5/10 7:56 PM, Wai Yip Tung wrote:
I'm fairly new to numpy and I'm trying to figure out the right way to do things. Continuing on my question about using recarray as a relation.
note that recarrays (or structured arrays, AFAIK, the difference is atturube access only -- I don't use recarrays) are far more static than a database table. So you may really want to use a database, or maybe pytables. Or maybe even just stick with lists. But if you are keeping things in memory, should be able to do what you want.
In [339]: arr = np.array([ .....: (1, 2.2, 0.0), .....: (3, 4.5, 0.0) .....: ], .....: dtype=[ .....: ('unit',int), .....: ('price',float), .....: ('amount',float), .....: ] .....: )
In [340]: data = arr.view(recarray)
One of the most common thing I want to do is to append rows to data.
numpy arrays do not naturally support appending, as you have discovered.
I think concatenate() might be the method.
yes.
But I get a problem:
In [342]: np.concatenate((data0,[1,9.0,9.0])) --------------------------------------------------------------------------- TypeError Traceback (most recent call last)
c:\Python26\Lib\site-packages\numpy\<ipython console> in<module>()
TypeError: expected a readable buffer object
concatenate expects two arrays to be joined. If you pass in something that can easily be turned into an array, it will work, but a tuple can be converted to multiple types of arrays, so it doesn't know what to do. So you need to re-construct the second array: a2 = np.array( [(3,5.5, 3)], dtype=dt) arr = np.concatenate( (arr, a2) )
In [343]: data.amount = data.unit * data.price
yup
But sometimes it may require me to add a new column not already exist, e.g.:
In [344]: data.discount_price = data.price * 0.9
How can I add a new column?
you can't. what you need to do is create a new array with a new dtype that includes the new field. The trick is that numpy only supports homogenous arrays -- evey item is the same data type. So when you could a strut array like above, numpy does not define it as a 2-d table, but rather, a 1-d array, each element of which is a structure. so you need to do something like: # create a new array data2 = np.zeros(len(data), dtype=dt2) # fill the array: for field_name in dt.fields.keys(): data2[field_name] = data[field_name] # now some calculations: data2['discount_price'] = data2['price'] * 0.9 I don't know of a way to avoid that loop when filling the array. Better yet -- anticipate your needs and create the array with all the fields you need in the first place. You can see that ndarrays are pretty static -- struct arrays can be useful data storage, but are not very suitable when things are changing much. You could write a class that wraps an andarray, and supports what you need better -- it could be a pretty usefull general purpose class, too. I've got one that handle the appending part, but nothing with adding new fields. Here's appending with my class: data3 = accumulator.accumulator(dtype = dt2) data3.append((1, 2.2, 0.0, 0.0)) data3.append((3, 4.5, 0.0, 0.0)) data3.append((2, 1.2, 0.0, 0.0)) data3.append((5, 4.2, 0.0, 0.0)) print repr(data3) # convert to regular array for calculations: data3 = np.array(data3) # now some calculations: data3['discount_price'] = data3['price'] * 0.9 You wouldn't have to convert to a regular array, except that I haven't written the code to support field access yet -- I don't think it would be too hard, though. I've enclosed some test code, and my accumulator class, in case you find it useful. -Chris -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov