Re: [SciPy-User] [R] Correlation coefficient of large data sets
@ Dennis
With 35000 variables at a time, the storage is under 20 Gb; you'd have to compute about 50 such chunks to get the entire matrix. Is there a way to calculate a column or row of the correlation matrix one at a time? I ma looking how including an additional set of observation effect the correlation. For example if I have variables a,b,c,d..... and set of observations 1-10 if the correlation is calculated for obs 1-5, I then add observations 6-10 and what to know the average effect of this on the correlation of c with (a,b,,d,e.....). So I only need a column or a row at a time. Just not clear to me how I would do this. @Joshua Wiley cor(my.data) # calculate the correlation matrix between all variables
(columns) of my.data
*Vincent Davis 720-301-3003 * vincent@vincentdavis.net my blog <http://vincentdavis.net> | LinkedIn<http://www.linkedin.com/in/vincentdavis> On Tue, Mar 16, 2010 at 12:06 AM, Joshua Wiley <jwiley.psych@gmail.com>wrote:
I think what you have done should be fine. read.table() will return a data frame, which cor() can handle happily. For example:
my.data <- read.table("file.csv", header = TRUE, row.names = 1, sep=",", strip.white = TRUE) # assign your data to "my.data"
cor(my.data) # calculate the correlation matrix between all variables (columns) of my.data
What happens if you try that?
participants (1)
-
Vincent Davis