[Numpy-discussion] Cross-covariance function

Thu Jan 26 11:07:15 EST 2012

Le 26/01/2012 15:57, Bruce Southey a écrit :
> Can you please provide a
> couple of real examples with expected output that clearly show what
> you want?
>
Hi Bruce,

Thanks for your ticket feedback ! It's precisely because I see a big 
potential impact of the proposed change that I send first a ML message, 
second a ticket before jumping to a pull-request like a Sergio Leone's 
cowboy (sorry, I watched "for a few dollars more" last weekend...)

Now, I realize that in the ticket writing I made the wrong trade-off 
between conciseness and accuracy which led to some of the errors you 
raised. Let me try to use your example to try to share what I have in mind.

> >> X = array([-2.1, -1. ,  4.3])
> >> Y = array([ 3.  ,  1.1 ,  0.12])

Indeed, with today's cov behavior we have a 2x2 array:
> >> cov(X,Y)
array([[ 11.71      ,  -4.286     ],
        [ -4.286     ,   2.14413333]])

Now, when I used the word 'concatenation', I wasn't precise enough 
because I meant assembling X and Y in the sense of 2 vectors of 
observations from 2 random variables X and Y.
This is achieved by concatenate(X,Y) *when properly playing with 
dimensions* (which I didn't mentioned) :
> >> XY = np.concatenate((X[None, :], Y[None, :]))
array([[-2.1 , -1.  ,  4.3 ],
        [ 3.  ,  1.1 ,  0.12]])

In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
> >> np.cov(XY)
array([[ 11.71      ,  -4.286     ],
        [ -4.286     ,   2.14413333]])

(And indeed, the actual cov Python code does use concatenate() )

Now let me come back to my assertion about this behavior *usefulness*.
You'll acknowledge that np.cov(XY) is made of four blocks (here just 4 
simple scalars blocks).
  * diagonal blocks are just cov(X) and cov(Y) (which in this case comes 
to var(X) and var(Y) when setting ddof to 1)
  * off diagonal blocks are symetric and are actually the covariance 
estimate of X, Y observations (from 
http://en.wikipedia.org/wiki/Covariance)

that is :
> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
-4.2860000000000005

The new proposed behaviour for cov is that cov(X,Y) would return :
array(-4.2860000000000005)  instead of the 2*2 matrix.

  * This would be in line with the cov(X,Y) mathematical definition, as 
well as with R behavior.
  * This would save memory and computing resources. (and therefore help 
save the planet ;-) )

However, I do understand that the impact for this change may be big. 
This indeed requires careful reviewing.

Pierre