[Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

josef.pktd at gmail.com josef.pktd at gmail.com
Wed Oct 26 15:20:15 EDT 2016


On Wed, Oct 26, 2016 at 3:11 PM, Mathew S. Madhavacheril <
mathewsyriac at gmail.com> wrote:

>
>
> On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith <njs at pobox.com> wrote:
>
>> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
>> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
>> > <mathewsyriac at gmail.com> wrote:
>> >>
>> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer <shoyer at gmail.com>
>> wrote:
>> >>>
>> >>> I wonder if the goals of this addition could be achieved by simply
>> adding
>> >>> an optional `cov` argument
>> >>>
>> >>> to np.corr, which would provide a pre-computed covariance.
>> >>
>> >>
>> >> That's a fair suggestion which I'm happy to switch to. This eliminates
>> the
>> >> need for two new functions.
>> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
>> returns
>> >> a tuple (corr, cov) instead.
>> >>
>> >>>
>> >>>
>> >>> Either way, `covcorr` feels like a helper function that could exist in
>> >>> user code rather than numpy proper.
>> >>
>> >>
>> >> The user would have to re-implement the part that converts the
>> covariance
>> >> matrix to a correlation
>> >> coefficient. I made this PR to avoid that code duplication.
>> >
>> >
>> > With the API I was envisioning (or even your proposed API, for that
>> matter),
>> > this function would only be a few lines, e.g.,
>> >
>> > def covcorr(x):
>> >     cov = np.cov(x)
>> >     corr = np.corrcoef(x, cov=cov)
>>
>> IIUC, if you have a covariance matrix then you can compute the
>> correlation matrix directly, without looking at 'x', so corrcoef(x,
>> cov=cov) is a bit odd-looking. I think probably the API that makes the
>> most sense is just to expose something like the covtocorr function
>> (maybe it could have a less telegraphic name?)? And then, yeah, users
>> can use that to build their own covcorr or whatever if they want it.
>>
>
> Right, agreed, this is why I said `x` becomes redundant when `cov` is
> specified
> when calling `numpy.corrcoef`.  So we have two alternatives:
>
> 1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
> that lets
> one obtain a tuple containing the covariance and the correlation matrices
> in the same call
> 2) Modify my original PR so that `np.covtocorr` remains (with possibly a
> better
> name) but remove `np.covcorr` since this is easy for the user to add.
>
> My preference is option 2.
>

cov2corr is a useful function
http://www.statsmodels.org/dev/generated/statsmodels.stats.moment_helpers.cov2corr.html
I also wrote the inverse function corr2cov, but AFAIR use it only in some
test cases.


I don't think adding any of the options to corrcoef or covcor is useful
since there is no computational advantage to it.
What I'm missing are functions that return the intermediate results, e.g.
var and mean or cov and mean.

(For statsmodels I decided to return mean and cov or mean and var in the
related functions. Some R packages return the mean as an option.)

Josef



>
>
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion at scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161026/6a6c0021/attachment.html>


More information about the NumPy-Discussion mailing list