A surprising result from benchmarking
Hi Everyone, I'm new to numpy, and I'm finding it hard to predict what is fast in python/numpy and what is slow. The following seems puzzling: I am doing the same thing an ugly way and a cleaner way. But the ugly map/lambda/filter expression is 15x faster than using numpy's internals. Can anyone explain why? For now, this makes me nervous about incorporating basic numpy functionality into real programs. ---Code starts here--- import scipy import time import psyco from numpy import matrix print("New run") myMat=scipy.randn(500,500) t1=time.time() highEnough=myMat>0.6 greaterPerLine=[sum(x) for x in highEnough] elapsed1=time.time()-t1 print("method 1 took %f seconds"%elapsed1) t2=time.time() greaterPerLine2=map(lambda(x):len(filter(lambda(y):y>0.6,x)),myMat) elapsed2=time.time()-t2 print("method 2 took %f seconds"%elapsed2) ---Output starts here--- New run method 1 took 3.566760 seconds method 2 took 0.232356 seconds --- Thanks so much! Dan
As soon as I posted that I realized it's due to the type conversions from True to 1. For some reason, this --- myMat=scipy.randn(500,500) t1=time.time() highEnough=(myMat>0.6)+0 greaterPerLine=[sum(x) for x in highEnough] elapsed1=time.time()-t1 print("method 1 took %f seconds"%elapsed1) --- remedies that to some extent. It is only 20% slower than the map. Still, there must be some way for me to make the clean way faster than greaterPerLine2=map(lambda(x):len(filter(lambda(y):y>0.6,x)),myMat) I appreciate any advice on how to do that. Thanks again, Dan
The inefficiency comes in the generic iteration and construction of int objects needed by the builtin sum function. Using the native numarray sum method on each row is much much faster, summing over the axis directly even faster still: t1=time.time() highEnough=myMat>0.6 greaterPerLine=[x.sum() for x in highEnough] elapsed1=time.time()-t1 print("method 1a took %f seconds"%elapsed1) t1=time.time() highEnough=myMat>0.6 greaterPerLine=highEnough.sum(axis=1) elapsed1=time.time()-t1 print("method 1b took %f seconds"%elapsed1) method 1 took 1.503523 seconds method 2 took 0.163641 seconds method 1a took 0.006665 seconds method 1b took 0.004070 seconds -Kevin On 3/11/07, Dan Becker <dbecker@alum.dartmouth.org> wrote:
As soon as I posted that I realized it's due to the type conversions from True to 1. For some reason, this
--- myMat=scipy.randn(500,500) t1=time.time() highEnough=(myMat>0.6)+0 greaterPerLine=[sum(x) for x in highEnough] elapsed1=time.time()-t1 print("method 1 took %f seconds"%elapsed1) ---
remedies that to some extent. It is only 20% slower than the map. Still, there must be some way for me to make the clean way faster than
greaterPerLine2=map(lambda(x):len(filter(lambda(y):y>0.6,x)),myMat)
I appreciate any advice on how to do that.
Thanks again, Dan
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Hi, I'd like to propose some minor modifications to the function bincount(arr, weights=None), so would like some feedback from other uses of bincount() before I write this up as a proper patch, . Background: bincount() has two forms: - bincount(x) returns an integer array ians of length max(x)+1 where ians[n] is the number of times n appears in x. - bincount(x, weights) returns a double array dans of length max(x)+1 where dans[n] is the sum of elements in the weight vector weights[i] at the positions where x[i]==n In both cases, all elements of x must be non-negative. Proposed changes: (1) Remove the restriction that elements of x must be non-negative. Currently bincount() starts by finding max(x) and min(x). If the min value is negative, an exception is raised. This change proposes dropping the initial search for min(x), and instead testing for non-negativity while summing values in the return arrays ians or dans. Any indexes where where x is negative will be silently ignored. This will allow selective bincounts where values to ignore are flagged with a negative bin number. (2) Allow an optional argument for maximum bin number. Currently bincount(x) returns an array whose length is dependent on max(x). It is sometimes preferable to specify the exact size of the returned array, so this change would add a new optional argument, max_bin, which is one less than the size of the returned array. Under this change, bincount() starts by finding max(x) only if max_bin is not specified. Then the returned array ians or dans is created with length max_bin+1, and any indexes that would overflow the output array are silently ignored. (3) Allow an optional output array, y. Currently bincount() creates a new output array each time. Sometimes it is preferable to add results to an existing output array, for example, when the input array is only available in smaller chunks, or for a progressive update strategy to avoid fp precision problems when adding lots of small weights to larger subtotals. Thus we can add an extra optional argument y that bypasses the creation of an output array. With these three change, the function signature of bincount() would become: bincount(x, weights=None, y=None, max_bin=None) Anyway, that's the general idea. I'd be grateful for any feedback before I code this up as a patch to _compiled_base.c. Cheers Stephen
Well, there were no responses to my earlier email proposing changes to numpy.bincount() to make it faster and more flexible. Does this mean noone is using bincount? :-) Anyway I've now made the proposed modifications and got substantial speedups of 3-10. Details are in this extract from my C code. For the time being I will test this as a standalone extension module. When stable and tests are written, I'll submit as a patch to the numpy's _compiled_base.c. Cheers, Stephen /************************************************ * Faster versions of bincount and casting strings to integers * * Author: Stephen Simmons, mail@stevesimmons.com * Date: 11 March 2007 * * This module contains C code for functions I am using to accelerate * SQL-like aggregate functions for a column-oriented database based on numpy. * * subtotal's bincount is typically 3-10 times faster than numpy's bincount, * and more flexible * - takes bins as they are, without promoting their type to int32 * - takes weights as they are, without promoting their type to double * - ignores negative bins, rather than reporting an error * - allows optional int32/double output array to be passed in * - specify maximum bin number to use. Larger bins numbers are ignored * - only scans for max bin number if neither max_bin nor out array specified * * atoi is 30-60 times faster than casting a string array to an integer type, * and may optionally use a translation table. The translation table is * a convenient way to map sparse integer strings to adjacent bins before * using bincount. * * Typical usage * ------------- # Set up arrays 5,000,000 elts long. s1 has strings filled with '1' and '2'. # w are weights
s1 = numpy.ndarray(shape=5000000, dtype='S1'); s1[:]='2'; s1[1::2]='1' w = numpy.arange(5000000,dtype='f4')
# Using standard numpy functions, string conversion is slow
i = s1.astype('i1') -> 5.95 sec (!) numpy.bincount(i) -> 113 ms numpy.bincount(i, w) -> 272 ms numpy.bincount(s1.astype('i1'), w) -> 6.12 sec
# Using the faster functions here:
i = subtotal.atoi(s1) -> 90 ms (60x faster) subtotal.bincount(i, max_bin=2) -> 31 ms (3.6x faster) subtotal.bincount(i, w, max_bin=2) -> 51 ms (5.3x faster)
# In both bases, output is array([2, 1, 2, ..., 1, 2, 1], dtype=int8) array([ 0, 2500000, 2500000]) array([ 0.00000000e+00, 6.25000000e+12, 6.24999750e+12]) # As an example of using a translate table, run bincount from atoi applying # a translate table for counting odd vs even strings. This does the whole lot # in 158ms, a speedup of 38x from 6.12 s above.
t = numpy.arange(256, dtype='i1') % 2 subtotal.bincount(subtotal.atoi(s1, t), w, max_bin=t.max()) -> 158 ms array([ 6.24999750e+12, 6.25000000e+12])
* * These timings are based on numpy-1.0.1 Windows binary distribution * for Python 2.5, compared with building this code using standard distutils * without any particular compiler optimisations: C:\mnt\dev\subtotal> python setup.py build -cmingw32 * Processor is a 1.83GHz Pentium-M ****************************************************/ << rest of C code omitted >> Stephen Simmons wrote:
Hi,
I'd like to propose some minor modifications to the function bincount(arr, weights=None), so would like some feedback from other uses of bincount() before I write this up as a proper patch, .
Background: bincount() has two forms: - bincount(x) returns an integer array ians of length max(x)+1 where ians[n] is the number of times n appears in x. - bincount(x, weights) returns a double array dans of length max(x)+1 where dans[n] is the sum of elements in the weight vector weights[i] at the positions where x[i]==n In both cases, all elements of x must be non-negative.
Proposed changes: (1) Remove the restriction that elements of x must be non-negative. Currently bincount() starts by finding max(x) and min(x). If the min value is negative, an exception is raised. This change proposes dropping the initial search for min(x), and instead testing for non-negativity while summing values in the return arrays ians or dans. Any indexes where where x is negative will be silently ignored. This will allow selective bincounts where values to ignore are flagged with a negative bin number.
(2) Allow an optional argument for maximum bin number. Currently bincount(x) returns an array whose length is dependent on max(x). It is sometimes preferable to specify the exact size of the returned array, so this change would add a new optional argument, max_bin, which is one less than the size of the returned array. Under this change, bincount() starts by finding max(x) only if max_bin is not specified. Then the returned array ians or dans is created with length max_bin+1, and any indexes that would overflow the output array are silently ignored.
(3) Allow an optional output array, y. Currently bincount() creates a new output array each time. Sometimes it is preferable to add results to an existing output array, for example, when the input array is only available in smaller chunks, or for a progressive update strategy to avoid fp precision problems when adding lots of small weights to larger subtotals. Thus we can add an extra optional argument y that bypasses the creation of an output array.
With these three change, the function signature of bincount() would become: bincount(x, weights=None, y=None, max_bin=None)
Anyway, that's the general idea. I'd be grateful for any feedback before I code this up as a patch to _compiled_base.c.
Cheers
Stephen
Hi Stephen, I'd de glad to test your bincount function for histogramming purposes. David 2007/3/14, Stephen Simmons <mail@stevesimmons.com>:
Well, there were no responses to my earlier email proposing changes to numpy.bincount() to make it faster and more flexible. Does this mean noone is using bincount? :-)
Anyway I've now made the proposed modifications and got substantial speedups of 3-10. Details are in this extract from my C code. For the time being I will test this as a standalone extension module. When stable and tests are written, I'll submit as a patch to the numpy's _compiled_base.c.
Cheers,
Stephen
/************************************************ * Faster versions of bincount and casting strings to integers * * Author: Stephen Simmons, mail@stevesimmons.com * Date: 11 March 2007 * * This module contains C code for functions I am using to accelerate * SQL-like aggregate functions for a column-oriented database based on numpy. * * subtotal's bincount is typically 3-10 times faster than numpy's bincount, * and more flexible * - takes bins as they are, without promoting their type to int32 * - takes weights as they are, without promoting their type to double * - ignores negative bins, rather than reporting an error * - allows optional int32/double output array to be passed in * - specify maximum bin number to use. Larger bins numbers are ignored * - only scans for max bin number if neither max_bin nor out array specified * * atoi is 30-60 times faster than casting a string array to an integer type, * and may optionally use a translation table. The translation table is * a convenient way to map sparse integer strings to adjacent bins before * using bincount. * * Typical usage * -------------
# Set up arrays 5,000,000 elts long. s1 has strings filled with '1' and '2'. # w are weights
s1 = numpy.ndarray(shape=5000000, dtype='S1'); s1[:]='2'; s1[1::2]='1' w = numpy.arange(5000000,dtype='f4')
# Using standard numpy functions, string conversion is slow
i = s1.astype('i1') -> 5.95 sec (!) numpy.bincount(i) -> 113 ms numpy.bincount(i, w) -> 272 ms numpy.bincount(s1.astype('i1'), w) -> 6.12 sec
# Using the faster functions here:
i = subtotal.atoi(s1) -> 90 ms (60x faster) subtotal.bincount(i, max_bin=2) -> 31 ms (3.6x faster) subtotal.bincount(i, w, max_bin=2) -> 51 ms (5.3x faster)
# In both bases, output is array([2, 1, 2, ..., 1, 2, 1], dtype=int8) array([ 0, 2500000, 2500000]) array([ 0.00000000e+00, 6.25000000e+12, 6.24999750e+12])
# As an example of using a translate table, run bincount from atoi applying # a translate table for counting odd vs even strings. This does the whole lot # in 158ms, a speedup of 38x from 6.12 s above.
t = numpy.arange(256, dtype='i1') % 2 subtotal.bincount(subtotal.atoi(s1, t), w, max_bin=t.max()) -> 158 ms array([ 6.24999750e+12, 6.25000000e+12])
* * These timings are based on numpy-1.0.1 Windows binary distribution * for Python 2.5, compared with building this code using standard distutils * without any particular compiler optimisations: C:\mnt\dev\subtotal> python setup.py build -cmingw32 * Processor is a 1.83GHz Pentium-M ****************************************************/
<< rest of C code omitted >>
Stephen Simmons wrote:
Hi,
I'd like to propose some minor modifications to the function bincount(arr, weights=None), so would like some feedback from other uses of bincount() before I write this up as a proper patch, .
Background: bincount() has two forms: - bincount(x) returns an integer array ians of length max(x)+1 where ians[n] is the number of times n appears in x. - bincount(x, weights) returns a double array dans of length max(x)+1 where dans[n] is the sum of elements in the weight vector weights[i] at the positions where x[i]==n In both cases, all elements of x must be non-negative.
Proposed changes: (1) Remove the restriction that elements of x must be non-negative. Currently bincount() starts by finding max(x) and min(x). If the min value is negative, an exception is raised. This change proposes dropping the initial search for min(x), and instead testing for non-negativity while summing values in the return arrays ians or dans. Any indexes where where x is negative will be silently ignored. This will allow selective bincounts where values to ignore are flagged with a negative bin number.
(2) Allow an optional argument for maximum bin number. Currently bincount(x) returns an array whose length is dependent on max(x). It is sometimes preferable to specify the exact size of the returned array, so this change would add a new optional argument, max_bin, which is one less than the size of the returned array. Under this change, bincount() starts by finding max(x) only if max_bin is not specified. Then the returned array ians or dans is created with length max_bin+1, and any indexes that would overflow the output array are silently ignored.
(3) Allow an optional output array, y. Currently bincount() creates a new output array each time. Sometimes it is preferable to add results to an existing output array, for example, when the input array is only available in smaller chunks, or for a progressive update strategy to avoid fp precision problems when adding lots of small weights to larger subtotals. Thus we can add an extra optional argument y that bypasses the creation of an output array.
With these three change, the function signature of bincount() would become: bincount(x, weights=None, y=None, max_bin=None)
Anyway, that's the general idea. I'd be grateful for any feedback before I code this up as a patch to _compiled_base.c.
Cheers
Stephen
_______________________________________________ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
participants (4)
-
Dan Becker
-
David Huard
-
Kevin Jacobs <jacobs@bioinformed.com>
-
Stephen Simmons