On 03/07/2010 12:05 AM, Vincent Davis wrote:
I just figured out that I had a few arrays that where taking up a bunch of the memory. That said I still wonder if there is a better way.


Vincent Davis
720-301-3003

vincent@vincentdavis.net



On Sat, Mar 6, 2010 at 10:22 PM, Vincent Davis <vincent@vincentdavis.net> wrote:
I have arrays of 8-20 rows and 230,000 column, all the data is float64
I what to be able to find the difference in the correlation matrix between arrays
let A and B be of size (10, 230000)
np.corrcoef(a)-np.corrcoef(b)

I can't seem to do this with more than 10000 columns at a time because of memory limitations. (about 9GB usable to python)
Is there a better way?

I also have problem finding the column means which is surprising to me, I was not able to get the column means for 10000 columns, but I can computer  the corrcoeff ?
np.mean(a, axis=0)

Do I just need to divide up the job or is there a better approach?

Thanks


Vincent Davis
720-301-3003

vincent@vincentdavis.net



_______________________________________________ SciPy-User mailing list SciPy-User@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-user
Is there a better way to do what?
A problem with 'np.corrcoef(a)-np.corrcoef(b)' is that it is unclear what you want as if a and b have more than 1d then you get an array back. If the array is near zero then what does that mean? One interpretation that perhaps you should be seeing if these are the same array. If the array is not zero then what does that mean? Do you need to know which parts of a and b lead to different correlations?

You can always do np.corrcoef(a,b) such that the diagonal relating each column of a to each column of b is one.

Bruce