Hi Bruce, Sorry for the delay in the answer. Le 27/01/2012 17:28, Bruce Southey a écrit :
The output is still a covariance so do we really need yet another set of very similar functions to maintain? Or can we get away with a new keyword?
The idea of an additional keyword seems appealing. Just to make sure I understood it well, you woud be proposing a new signature like : def cov(.... get_full_cov_matrix=True) and when `get_full_cov_matrix` is set to False, only the cross covariance part would be returned. Am I right ?
If speed really matters to you guys then surely moving np.cov into C would have more impact on 'saving the world' than this proposal. That also ignores algorithm used ( http://en.wikipedia.org/wiki/Algorithms_for_calculating_variance#Covariance).
Actually np.cov also is deficient in that it does not have the dtype argument so it is prone to numerical precision errors (especially getting the mean of the array). Probably should be a ticket... I'm not a specialist of numerical precisions, but I got very impressed by the recent example raised on Jan 24th by Michael Aye which was one of
I didn't get your point about the algorithm here. From this nomenclature, I would say that numpy.cov is based on a vectorized "two-pass algorithm" which computes the means first and then substracts it before computing the matrix product. Would you make it different ? the first "real life" example I've seen. The way I see the cov algorithm, I see first a possibility to propagate an optional dtype argument to the mean computation. However, I'm unsure about what to do after, for the matrix product since "dot(X.T, X.conj()) / fact" is also a sort of mean computation. Therefore it can also be affected by numerical precision issue. What would you suggest ? (the only solution I see would be to use the running variance algorithm. Since the code wouldn't be vectorized anymore, this indeed would benefits from going to C) Best, Pierre