Mailman 3 numpy large arrays? - NumPy-Discussion

Dec. 12, 2007

      Hi all

I need to perform computations involving large arrays. A lot of rows and no
more than e.g. 34 columns. My first choice is python/numpy because I'm
already used to code in matlab.

However I'm experiencing memory problems even though there is still 500 MB
available (2 GB total). I have cooked down my code to following meaningless
code snip. This code share some of the same structure and calls as my real
program and shows the same behaviour.

********************************************************
import numpy as N
import scipy as S

def stress():
    x = S.randn(200000,80)
    for i in range(8):
        print "%(0)d" % {"0": i}
        s = N.dot(x.T, x)
        sd = N.array([s.diagonal()])
        r = N.dot(N.ones((N.size(x,0),1),'d'), sd)
        x = x + r
        x = x / 1.01

********************************************************

To different symptoms depending how big x are:
1) the program becomes extremely slow after a few iterations.
2) if the size of x is increased a little the program fails with the message
"MemoryError" for example at line 'x = x + r', but different places in the
code depending on the matrice size and which computer I'm testing. This
might also occur after several iterations, not just during the first pass.

I'm using Windows XP, ActivePython 2.5.1.1, NumPy 1.0.4, SciPy  0.6.0.

- Is there an error under the hood in NumPy?
- Am I balancing on the edge of the performance of Python/NumPy and should
consider other environments. Fortran, C, BLAS, LAPACK e.t.c.
- Am I misusing NumPy? Changing coding style will be a good workaround and
even perform on larger datasets without errors?

Thanks in advance
/Søren

numpy large arrays?

Søren Dyrsting

Stefan van der Walt

Timothy Hochberg

tags

participants (3)