[Numpy-discussion] Cross-covariance function
Bruce Southey
bsouthey at gmail.com
Thu Jan 26 13:25:55 EST 2012
On Thu, Jan 26, 2012 at 10:07 AM, Pierre Haessig
<pierre.haessig at crans.org> wrote:
> Le 26/01/2012 15:57, Bruce Southey a écrit :
>> Can you please provide a
>> couple of real examples with expected output that clearly show what
>> you want?
>>
> Hi Bruce,
>
> Thanks for your ticket feedback ! It's precisely because I see a big
> potential impact of the proposed change that I send first a ML message,
> second a ticket before jumping to a pull-request like a Sergio Leone's
> cowboy (sorry, I watched "for a few dollars more" last weekend...)
>
> Now, I realize that in the ticket writing I made the wrong trade-off
> between conciseness and accuracy which led to some of the errors you
> raised. Let me try to use your example to try to share what I have in mind.
>
>> >> X = array([-2.1, -1. , 4.3])
>> >> Y = array([ 3. , 1.1 , 0.12])
>
> Indeed, with today's cov behavior we have a 2x2 array:
>> >> cov(X,Y)
> array([[ 11.71 , -4.286 ],
> [ -4.286 , 2.14413333]])
>
> Now, when I used the word 'concatenation', I wasn't precise enough
> because I meant assembling X and Y in the sense of 2 vectors of
> observations from 2 random variables X and Y.
> This is achieved by concatenate(X,Y) *when properly playing with
> dimensions* (which I didn't mentioned) :
>> >> XY = np.concatenate((X[None, :], Y[None, :]))
> array([[-2.1 , -1. , 4.3 ],
> [ 3. , 1.1 , 0.12]])
In this context, I find stacking, np.vstack((X,Y)), more appropriate
than concatenate.
>
> In this case, I can indeed say that "cov(X,Y) is equivalent to cov(XY)".
>> >> np.cov(XY)
> array([[ 11.71 , -4.286 ],
> [ -4.286 , 2.14413333]])
>
Sure the resulting array is the same but whole process is totally different.
> (And indeed, the actual cov Python code does use concatenate() )
Yes, but the user does not see that. Whereas you are forcing the user
to do the stacking in the correct dimensions.
>
>
> Now let me come back to my assertion about this behavior *usefulness*.
> You'll acknowledge that np.cov(XY) is made of four blocks (here just 4
> simple scalars blocks).
No there are not '4' blocks just rows and columns.
> * diagonal blocks are just cov(X) and cov(Y) (which in this case comes
> to var(X) and var(Y) when setting ddof to 1)
Sure but variances are still covariances.
> * off diagonal blocks are symetric and are actually the covariance
> estimate of X, Y observations (from
> http://en.wikipedia.org/wiki/Covariance)
Sure
>
> that is :
>> >> ((X-X.mean()) * (Y-Y.mean())).sum()/ (3-1)
> -4.2860000000000005
>
> The new proposed behaviour for cov is that cov(X,Y) would return :
> array(-4.2860000000000005) instead of the 2*2 matrix.
But how you interpret an 2D array where the rows are greater than 2?
>>> Z=Y+X
>>> np.cov(np.vstack((X,Y,Z)))
array([[ 11.71 , -4.286 , 7.424 ],
[ -4.286 , 2.14413333, -2.14186667],
[ 7.424 , -2.14186667, 5.28213333]])
>
> * This would be in line with the cov(X,Y) mathematical definition, as
> well as with R behavior.
I don't care what R does because I am using Python and Python is
infinitely better than R is!
But I think that is only in the 1D case.
> * This would save memory and computing resources. (and therefore help
> save the planet ;-) )
Nothing that you have provided shows that it will.
>
> However, I do understand that the impact for this change may be big.
> This indeed requires careful reviewing.
>
> Pierre
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
Bruce
More information about the NumPy-Discussion
mailing list