[Numpy-discussion] extremely slow array indexing?

Thu Nov 30 12:20:47 EST 2006

Hi,

I am writing code to sort the columns according to the sum of each column.
The dataset is huge (50k rows x 300k cols), so i have to read line by line
and do the summation to avoid the out-of-memory problem. But I don't know
why it runs very slow, and part of the code is as follows. Can anyone point
out what needs to be modified to make it run fast? thanks in advance!

...
from numpy import *
...

       currSum = zeros(self.componentcount)
       currRow = zeros(self.componentcount)
       for featureDict in self.featureDictList:
           currRow[:] = 0
           for components in self.componentdict1:
               if featureDict.has_key(components):
                   col = self.componentdict1[components]
                   value = featureDict[components]
                   currRow[col]=value;
           currSum = currSum + row;
...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20061130/cc9d7acc/attachment.html>