Re: [Numpy-discussion] Cross-covariance function

26 Jan 2012

      On Thu, Jan 26, 2012 at 1:25 PM, Bruce Southey  wrote:
...
On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
 wrote:
...
Le 26/01/2012 15:57, Bruce Southey a écrit :
...
Can you please provide a
couple of real examples with expected output that clearly show what
you want?
Hi Bruce,
Thanks for your ticket feedback ! It's precisely because I see a big
potential impact of the proposed change that I send first a ML message,
second a ticket before jumping to a pull-request like a Sergio Leone's
cowboy (sorry, I watched "for a few dollars more" last weekend...)
Now, I realize that in the ticket writing I made the wrong trade-off
between conciseness and accuracy which led to some of the errors you
raised. Let me try to use your example to try to share what I have in mind.
...
...
...
X = array([-2.1, -1. ,  4.3])
Y = array([ 3.  ,  1.1 ,  0.12])
Indeed, with today's cov behavior we have a 2x2 array:
...
...
...
cov(X,Y)
array([[ 11.71      ,  -4.286     ],
       [ -4.286     ,   2.14413333]])
Now, when I used the word 'concatenation', I wasn't precise enough
because I meant assembling X and Y in the sense of 2 vectors of
observations from 2 random variables X and Y.
This is achieved by concatenate(X,Y) *when properly playing with
dimensions* (which I didn't mentioned) :
...
...
...
XY = np.concatenate((X[None, :], Y[None, :]))
array([[-2.1 , -1.  ,  4.3 ],
       [ 3.  ,  1.1 ,  0.12]])
In this context, I find stacking,  np.vstack((X,Y)), more appropriate
than concatenate.
...
In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
...
...
...
np.cov(XY)
array([[ 11.71      ,  -4.286     ],
       [ -4.286     ,   2.14413333]])
Sure the resulting array is the same but whole process is totally different.
...
(And indeed, the actual cov Python code does use concatenate() )
Yes, but the user does not see that. Whereas you are forcing the user
to do the stacking in the correct dimensions.
...
Now let me come back to my assertion about this behavior *usefulness*.
You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
simple scalars blocks).
No there are not '4' blocks just rows and columns.
Sturla showed the 4 blocks in his first message.
...
...
 * diagonal blocks are just cov(X) and cov(Y) (which in this case comes
to var(X) and var(Y) when setting ddof to 1)
Sure but variances are still covariances.
...
 * off diagonal blocks are symetric and are actually the covariance
estimate of X, Y observations (from
http://en.wikipedia.org/wiki/Covariance)
Sure
that is :
...
...
...
((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
-4.2860000000000005
The new proposed behaviour for cov is that cov(X,Y) would return :
array(-4.2860000000000005)  instead of the 2*2 matrix.
But how you interpret an 2D array where the rows are greater than 2?
...
...
...
Z=Y+X
np.cov(np.vstack((X,Y,Z)))
array([[ 11.71      ,  -4.286     ,   7.424     ],
      [ -4.286     ,   2.14413333,  -2.14186667],
      [  7.424     ,  -2.14186667,   5.28213333]])
...
 * This would be in line with the cov(X,Y) mathematical definition, as
well as with R behavior.
I don't care what R does because I am using Python and Python is
infinitely better than R is!
But I think that is only in the 1D case.
I just checked R to make sure I remember correctly
...
xx = matrix((1:20)^2, nrow=4)
xx
     [,1] [,2] [,3] [,4] [,5]
[1,]    1   25   81  169  289
[2,]    4   36  100  196  324
[3,]    9   49  121  225  361
[4,]   16   64  144  256  400
cov(xx, 2*xx[,1:2])
         [,1]      [,2]
[1,]  86.0000  219.3333
[2,] 219.3333  566.0000
[3,] 352.6667  912.6667
[4,] 486.0000 1259.3333
[5,] 619.3333 1606.0000
cov(xx)
         [,1]     [,2]      [,3]      [,4]      [,5]
[1,]  43.0000 109.6667  176.3333  243.0000  309.6667
[2,] 109.6667 283.0000  456.3333  629.6667  803.0000
[3,] 176.3333 456.3333  736.3333 1016.3333 1296.3333
[4,] 243.0000 629.6667 1016.3333 1403.0000 1789.6667
[5,] 309.6667 803.0000 1296.3333 1789.6667 2283.0000
...
...
 * This would save memory and computing resources. (and therefore help
save the planet ;-) )
Nothing that you have provided shows that it will.
I don't know about saving the planet, but if X and Y have the same
number of columns, we save 3 quarters of the calculations, as Sturla
also explained in his first message.

Josef
...
...
However, I do understand that the impact for this change may be big.
This indeed requires careful reviewing.
Pierre
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion
Bruce
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Cross-covariance function

josef.pktd＠gmail.com