[Numpy-discussion] numpy sum table by category

Ernest Adrogué eadrogue at gmx.net
Wed Jan 13 06:57:03 EST 2010

12/01/10 @ 15:33 (-0500), thus spake Marc Schwarzschild:
> I have a csv file like this:
>     Account, Symbol, Quantity, Price
>     One,SPY,5,119.00
>     One,SPY,3,120.00
>     One,SPY,-2,125.00
>     One,GE,...
>     One,GE,...
>     Two,SPY, ...
>     Three,GE, ...
>      ...
> The data is much larger, could be 10,000 records.  I can load it
> into a numpy array using matplotlib.mlab.csv2rec().  I learned
> several useful numpy functions and have been reading lots of
> documentation.  However, I have not found a way to create a
> unique list of symbols and the Sum of their respective Quantity
> values.

If x is your record array:

for sym in set(x['Symbol']):
	mask = x['Symbol'] == sym
	print sym, x[mask]['Quantity'].sum()

> I want do various calculations on the data like pull out
> all the records for a given Account.  The actual data has lots
> more columns and sometimes I may want to sum(Quantity*Price) by
> Account and Symbol.

To get a subset of records matching an arbitrary criteria, you
use boolean arrays. For example, x['Account'] == 'name' generates
a boolean array of the same length as x, with each element being
True or False depending on whether in that record the Account
field was equal to 'name'. Then such arrays can be used as an
index on the original x array, to get the subset of records. This
is what the example above does.


More information about the NumPy-Discussion mailing list