On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey
On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
wrote: Le 26/01/2012 15:57, Bruce Southey a écrit :
Can you please provide a couple of real examples with expected output that clearly show what you want?
Hi Bruce,
Thanks for your ticket feedback ! It's precisely because I see a big potential impact of the proposed change that I send first a ML message, second a ticket before jumping to a pull-request like a Sergio Leone's cowboy (sorry, I watched "for a few dollars more" last weekend...)
Now, I realize that in the ticket writing I made the wrong trade-off between conciseness and accuracy which led to some of the errors you raised. Let me try to use your example to try to share what I have in mind.
X = array([-2.1, -1. , 4.3]) Y = array([ 3. , 1.1 , 0.12])
Indeed, with today's cov behavior we have a 2x2 array:
cov(X,Y) array([[ 11.71 , -4.286 ], [ -4.286 , 2.14413333]])
Now, when I used the word 'concatenation', I wasn't precise enough because I meant assembling X and Y in the sense of 2 vectors of observations from 2 random variables X and Y. This is achieved by concatenate(X,Y) *when properly playing with dimensions* (which I didn't mentioned) :
XY = np.concatenate((X[None, :], Y[None, :])) array([[-2.1 , -1. , 4.3 ], [ 3. , 1.1 , 0.12]])
In this context, I find stacking, np.vstack((X,Y)), more appropriate than concatenate.
In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
np.cov(XY) array([[ 11.71 , -4.286 ], [ -4.286 , 2.14413333]])
Sure the resulting array is the same but whole process is totally different.
(And indeed, the actual cov Python code does use concatenate() ) Yes, but the user does not see that. Whereas you are forcing the user to do the stacking in the correct dimensions.
Now let me come back to my assertion about this behavior *usefulness*. You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 simple scalars blocks).
No there are not '4' blocks just rows and columns.
Sturla showed the 4 blocks in his first message.
* diagonal blocks are just cov(X) and cov(Y) (which in this case comes to var(X) and var(Y) when setting ddof to 1) Sure but variances are still covariances.
* off diagonal blocks are symetric and are actually the covariance estimate of X, Y observations (from http://en.wikipedia.org/wiki/Covariance) Sure
that is :
((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1) -4.2860000000000005
The new proposed behaviour for cov is that cov(X,Y) would return : array(-4.2860000000000005) instead of the 2*2 matrix.
But how you interpret an 2D array where the rows are greater than 2?
Z=Y+X np.cov(np.vstack((X,Y,Z))) array([[ 11.71 , -4.286 , 7.424 ], [ -4.286 , 2.14413333, -2.14186667], [ 7.424 , -2.14186667, 5.28213333]])
* This would be in line with the cov(X,Y) mathematical definition, as well as with R behavior.
I don't care what R does because I am using Python and Python is infinitely better than R is!
But I think that is only in the 1D case.
I just checked R to make sure I remember correctly
xx = matrix((1:20)^2, nrow=4) xx [,1] [,2] [,3] [,4] [,5] [1,] 1 25 81 169 289 [2,] 4 36 100 196 324 [3,] 9 49 121 225 361 [4,] 16 64 144 256 400 cov(xx, 2*xx[,1:2]) [,1] [,2] [1,] 86.0000 219.3333 [2,] 219.3333 566.0000 [3,] 352.6667 912.6667 [4,] 486.0000 1259.3333 [5,] 619.3333 1606.0000 cov(xx) [,1] [,2] [,3] [,4] [,5] [1,] 43.0000 109.6667 176.3333 243.0000 309.6667 [2,] 109.6667 283.0000 456.3333 629.6667 803.0000 [3,] 176.3333 456.3333 736.3333 1016.3333 1296.3333 [4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667 [5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000
* This would save memory and computing resources. (and therefore help save the planet ;-) ) Nothing that you have provided shows that it will.
I don't know about saving the planet, but if X and Y have the same number of columns, we save 3 quarters of the calculations, as Sturla also explained in his first message. Josef
However, I do understand that the impact for this change may be big. This indeed requires careful reviewing.
Pierre _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Bruce _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion