On 03/07/2010 12:05 AM, Vincent Davis wrote:

I just figured out that I had a few arrays that where taking up a bunch of the memory. That said I still wonder if there is a better way.

Vincent Davis
720-301-3003
vincent@vincentdavis.net

my blog | LinkedIn

On Sat, Mar 6, 2010 at 10:22 PM, Vincent Davis <vincent@vincentdavis.net> wrote:

I have arrays of 8-20 rows and 230,000 column, all the data is float64

I what to be able to find the difference in the correlation matrix between arrays

let A and B be of size (10, 230000)

np.corrcoef(a)-np.corrcoef(b)

I can't seem to do this with more than 10000 columns at a time because of memory limitations. (about 9GB usable to python)

Is there a better way?

I also have problem finding the column means which is surprising to me, I was not able to get the column means for 10000 columns, but I can computer the corrcoeff ?

np.mean(a, axis=0)

Do I just need to divide up the job or is there a better approach?

Thanks

Vincent Davis
720-301-3003
vincent@vincentdavis.net

my blog | LinkedIn
_______________________________________________
SciPy-User mailing list
SciPy-User@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-user
  

Is there a better way to do what?
A problem with 'np.corrcoef(a)-np.corrcoef(b)' is that it is unclear what you want as if a and b have more than 1d then you get an array back. If the array is near zero then what does that mean? One interpretation that perhaps you should be seeing if these are the same array. If the array is not zero then what does that mean? Do you need to know which parts of a and b lead to different correlations?

You can always do np.corrcoef(a,b) such that the diagonal relating each column of a to each column of b is one.

Bruce