[Numpy-discussion] Array vectorization in numpy

Tue Jul 19 16:11:13 EDT 2011

On Tue, 19 Jul 2011 17:49:14 +0200, Carlos Becker wrote:
> I made more tests with the same operation, restricting Matlab to use a
> single processing unit. I got:
> 
> - Matlab: 0.0063 sec avg
> - Numpy: 0.026 sec avg
> - Numpy with weave.blitz: 0.0041

To check if it's an issue with building without optimizations,
look at the build log:

C compiler: gcc -pthread -fno-strict-aliasing "-ggdb" -fPIC
...
gcc: build/src.linux-x86_64-2.7/numpy/core/src/umath/umathmodule.c

I.e., look on the "C compiler:" line nearest to the "umathmodule"
compilation. Above is an example with no optimization.

    ***

For me, compared to zeroing the memory via memset & plain
C implementation (Numpy 1.6.0 / gcc):

Blitz: 0.00746664
Numpy: 0.00711051
Zeroing (memset): 0.00263333
Operation in C: 0.00706667

with "gcc -O3 -ffast-math -march=native -mfpmath=sse" optimizations
for the C code (involving SSE2 vectorization and whatnot, looking at
the assembler output). Numpy is already going essentially at the maximum
speed.

-----------------
#include <stdlib.h>
#include <time.h>
#include <stdio.h>
#include <string.h>

int main()
{
    double *a, *b;
    int N = 2000*2000, M=300;
    int j;
    int k;
    clock_t start, end;

    a = (double*)malloc(sizeof(double)*N);
    b = (double*)malloc(sizeof(double)*N);

    start = clock();
    for (k = 0; k < M; ++k) {
    	memset(a, '\0', sizeof(double)*N);
    }
    end = clock();
    printf("Zeroing (memset): %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M);

    start = clock();
    for (k = 0; k < M; ++k) {
    	for (j = 0; j < N; ++j) {
    	    b[j] = a[j] - 0.5;
    	}
    }
    end = clock();
    printf("Operation in C: %g\n", ((double)(end-start))/CLOCKS_PER_SEC/M);

    return 0;
}