[Numpy-discussion] Advice please on efficient subtotal function

Greg Willden gregwillden at gmail.com
Fri Dec 29 09:04:00 EST 2006


Hi Stephen,
If you want to sum/average down a column or across a row you can use sum().
The optional axis={0,1} parameter determines whether you are summing down a
column (default or axis=0) or across a row (axis=1).
Greg

On 12/29/06, Stephen Simmons <mail at stevesimmons.com> wrote:
>
> Hi,
>
> I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid.
> This
> is more easily explained in code that words, thus:
>
> for n in xrange(len(data)):
>     totals[ i[n], j[n] ] += data[n]
>
> data comes from a series of PyTables files with ~200m rows. Each row has
> ~20
> cols, and I use the first three columns (which are 1-3 char strings) to
> form
> the indexing functions i[] and j[], then want to calc averages of the
> remaining 17 numerical cols.
>
> I have tried various indirect ways of doing this with searchsorted and
> bincount, but intuitively they feel overly complex solutions to what is
> essentially a very simple problem.
>
> My work involved comparing the subtotals for various different
> segmentation
> strategies (the i[] and j[] indexing functions). Efficient solutions are
> important because I need to make many passes through the 200m rows of
> data.
> Memory usage is the easiest thing for me to adjust by changing how many
> rows
> of data to read in for each pass and then reusing the same array data
> buffers.
>
> Thanks in advance for any suggestions!
>
> Stephen
>
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>



-- 
Linux.  Because rebooting is for adding hardware.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20061229/8768ef99/attachment.html>


More information about the NumPy-Discussion mailing list