![](https://secure.gravatar.com/avatar/c5604334204d95a9c06da59e4f21ae02.jpg?s=120&d=mm&r=g)
Hi, I'm looking for efficient ways to subtotal a 1-d array onto a 2-D grid. This is more easily explained in code that words, thus: for n in xrange(len(data)): totals[ i[n], j[n] ] += data[n] data comes from a series of PyTables files with ~200m rows. Each row has ~20 cols, and I use the first three columns (which are 1-3 char strings) to form the indexing functions i[] and j[], then want to calc averages of the remaining 17 numerical cols. I have tried various indirect ways of doing this with searchsorted and bincount, but intuitively they feel overly complex solutions to what is essentially a very simple problem. My work involved comparing the subtotals for various different segmentation strategies (the i[] and j[] indexing functions). Efficient solutions are important because I need to make many passes through the 200m rows of data. Memory usage is the easiest thing for me to adjust by changing how many rows of data to read in for each pass and then reusing the same array data buffers. Thanks in advance for any suggestions! Stephen
![](https://secure.gravatar.com/avatar/95a1a1b53dc3d687808ecfb70ad8e891.jpg?s=120&d=mm&r=g)
Hi Stephen, If you want to sum/average down a column or across a row you can use sum(). The optional axis={0,1} parameter determines whether you are summing down a column (default or axis=0) or across a row (axis=1). Greg On 12/29/06, Stephen Simmons <mail@stevesimmons.com> wrote:
-- Linux. Because rebooting is for adding hardware.
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Divendres 29 Desembre 2006 10:05, Stephen Simmons escrigué:
Well, from your words I guess you should already have tested this, but just in case. As PyTables saves data in tables row-wise, it is always faster using the complete row for computations in each iteration than using just a single column. This is shown in the small benchmark that I'm attaching at the end of the message. Here is its output for a table with 1m rows: time for creating the file--> 12.044 time for using column reads --> 46.407 time for using the row wise iterator--> 73.036 time for using block reads (row wise)--> 5.156 So, using block reads (in case you can use them) is your best bet. HTH, -------------------------------------------------------------------------------------- import tables import numpy from time import time nrows = 1000*1000 # Create a table definition with 17 double cols and 3 string cols coltypes = numpy.dtype("f8,"*17 + "S3,"*3) t1 = time() # Create a file with an empty table. Use compression to minimize file size. f = tables.openFile("/tmp/prova.h5", 'w') table = f.createTable(f.root, 'table', numpy.empty(0, coltypes), filters=tables.Filters(complevel=1, complib='lzo')) # Fill the table with default values (empty strings and zeros) row = table.row for nrow in xrange(nrows): row.append() f.close() print "time for creating the file-->", round(time()-t1, 3) # *********** Start benchmarks ************************** f = tables.openFile("/tmp/prova.h5", 'r') table = f.root.table colnames = table.colnames[:-3] # exclude the string cols # Loop over the table using column reads t1 = time(); cum = numpy.zeros(17) for ncol, colname in enumerate(colnames): col = table.read(0, nrows, field=colname) cum[ncol] += col.sum() print "time for using column reads -->", round(time()-t1, 3) # Loop over the table using its row iterator t1 = time(); cum = numpy.zeros(17) for row in table: for ncol, colname in enumerate(colnames): cum[ncol] += row[colname] print "time for using the row iterator-->", round(time()-t1, 3) # Loop over the table using block reads (row wise) t1 = time(); cum = numpy.zeros(17) step = 10000 for nrow in xrange(0, nrows, step): ra = table[nrow:nrow+step] for ncol, colname in enumerate(colnames): cum[ncol] += ra[colname].sum() print "time for using block reads (row wise)-->", round(time()-t1, 3) f.close() --
![](https://secure.gravatar.com/avatar/c5604334204d95a9c06da59e4f21ae02.jpg?s=120&d=mm&r=g)
Thanks Francesc, but I am already planning to read the data in block-wise as you suggest. My question is rather how best to update the subtotals for each block in a parallel way using numpy efficiently, rather than a simplistic and slow element-by-element loop. I can't use a simple sum(), as in your benchmark example or Greg's reply, because I need to do: for n in xrange(len(data)): totals[ i[n], j[n] ] += data[n] and not for n in xrange(len(data)): totals[n] += data[n] My best solution so far is roughly like this: - read in next block of 100k or so rows (taking into account the PyTables table's _v_maxTuples and _v_chunksize) - calculate the subtotal index arrays i and j - do a lexsort() on [i, j, n] - partition the sorted [i, j, n] into subsets where the i and j arrays change values. The k_th such subset is thus s_k = [ i_k, j_k, [n_k0, ..., n_kN] ] - update the subtotals for each subset in the block totals[i_k, j_k]+= sum(data[n_k0, ..., n_kN]) This should be reasonably efficient, but it's messy, and I'm not familar enough with numpy's indexing tricks to get this right first time. Maybe instead I'll have a go at writing a Pyrex function that implements the simple loop at C speed: subtotal2d(data_array, idx_array, out=None, dtype=None) where data_array is Nx1, idx_array is NxM and out is M-dimensional. Incidentally, there's one other function I'd find useful here in forming the index arrays i[] and j[], a fast translate-from-dict function: arr2 = fromiter( d[a] for a in arr1 ) My initial impression is that a C version would be substantially faster; maybe I should do some benchmarking to see whether a pure Python/numpy approach is actually faster than I expect. Cheers, and thanks for any further suggestions, Stephen Francesc Altet <faltet@carabos.com> wrote:
--------------------------------------------------------------------------------------
![](https://secure.gravatar.com/avatar/95a1a1b53dc3d687808ecfb70ad8e891.jpg?s=120&d=mm&r=g)
Hi Stephen, If you want to sum/average down a column or across a row you can use sum(). The optional axis={0,1} parameter determines whether you are summing down a column (default or axis=0) or across a row (axis=1). Greg On 12/29/06, Stephen Simmons <mail@stevesimmons.com> wrote:
-- Linux. Because rebooting is for adding hardware.
![](https://secure.gravatar.com/avatar/5c7407de6b47afcd3b3e2164ff5bcd45.jpg?s=120&d=mm&r=g)
A Divendres 29 Desembre 2006 10:05, Stephen Simmons escrigué:
Well, from your words I guess you should already have tested this, but just in case. As PyTables saves data in tables row-wise, it is always faster using the complete row for computations in each iteration than using just a single column. This is shown in the small benchmark that I'm attaching at the end of the message. Here is its output for a table with 1m rows: time for creating the file--> 12.044 time for using column reads --> 46.407 time for using the row wise iterator--> 73.036 time for using block reads (row wise)--> 5.156 So, using block reads (in case you can use them) is your best bet. HTH, -------------------------------------------------------------------------------------- import tables import numpy from time import time nrows = 1000*1000 # Create a table definition with 17 double cols and 3 string cols coltypes = numpy.dtype("f8,"*17 + "S3,"*3) t1 = time() # Create a file with an empty table. Use compression to minimize file size. f = tables.openFile("/tmp/prova.h5", 'w') table = f.createTable(f.root, 'table', numpy.empty(0, coltypes), filters=tables.Filters(complevel=1, complib='lzo')) # Fill the table with default values (empty strings and zeros) row = table.row for nrow in xrange(nrows): row.append() f.close() print "time for creating the file-->", round(time()-t1, 3) # *********** Start benchmarks ************************** f = tables.openFile("/tmp/prova.h5", 'r') table = f.root.table colnames = table.colnames[:-3] # exclude the string cols # Loop over the table using column reads t1 = time(); cum = numpy.zeros(17) for ncol, colname in enumerate(colnames): col = table.read(0, nrows, field=colname) cum[ncol] += col.sum() print "time for using column reads -->", round(time()-t1, 3) # Loop over the table using its row iterator t1 = time(); cum = numpy.zeros(17) for row in table: for ncol, colname in enumerate(colnames): cum[ncol] += row[colname] print "time for using the row iterator-->", round(time()-t1, 3) # Loop over the table using block reads (row wise) t1 = time(); cum = numpy.zeros(17) step = 10000 for nrow in xrange(0, nrows, step): ra = table[nrow:nrow+step] for ncol, colname in enumerate(colnames): cum[ncol] += ra[colname].sum() print "time for using block reads (row wise)-->", round(time()-t1, 3) f.close() --
![](https://secure.gravatar.com/avatar/c5604334204d95a9c06da59e4f21ae02.jpg?s=120&d=mm&r=g)
Thanks Francesc, but I am already planning to read the data in block-wise as you suggest. My question is rather how best to update the subtotals for each block in a parallel way using numpy efficiently, rather than a simplistic and slow element-by-element loop. I can't use a simple sum(), as in your benchmark example or Greg's reply, because I need to do: for n in xrange(len(data)): totals[ i[n], j[n] ] += data[n] and not for n in xrange(len(data)): totals[n] += data[n] My best solution so far is roughly like this: - read in next block of 100k or so rows (taking into account the PyTables table's _v_maxTuples and _v_chunksize) - calculate the subtotal index arrays i and j - do a lexsort() on [i, j, n] - partition the sorted [i, j, n] into subsets where the i and j arrays change values. The k_th such subset is thus s_k = [ i_k, j_k, [n_k0, ..., n_kN] ] - update the subtotals for each subset in the block totals[i_k, j_k]+= sum(data[n_k0, ..., n_kN]) This should be reasonably efficient, but it's messy, and I'm not familar enough with numpy's indexing tricks to get this right first time. Maybe instead I'll have a go at writing a Pyrex function that implements the simple loop at C speed: subtotal2d(data_array, idx_array, out=None, dtype=None) where data_array is Nx1, idx_array is NxM and out is M-dimensional. Incidentally, there's one other function I'd find useful here in forming the index arrays i[] and j[], a fast translate-from-dict function: arr2 = fromiter( d[a] for a in arr1 ) My initial impression is that a C version would be substantially faster; maybe I should do some benchmarking to see whether a pure Python/numpy approach is actually faster than I expect. Cheers, and thanks for any further suggestions, Stephen Francesc Altet <faltet@carabos.com> wrote:
--------------------------------------------------------------------------------------
participants (3)
-
Francesc Altet
-
Greg Willden
-
Stephen Simmons