[Numpy-discussion] Combining covariance and correlation coefficient into one numpy.cov call

Mathew S. Madhavacheril mathewsyriac at gmail.com
Wed Oct 26 15:11:22 EDT 2016


On Wed, Oct 26, 2016 at 2:56 PM, Nathaniel Smith <njs at pobox.com> wrote:

> On Wed, Oct 26, 2016 at 11:13 AM, Stephan Hoyer <shoyer at gmail.com> wrote:
> > On Wed, Oct 26, 2016 at 11:03 AM, Mathew S. Madhavacheril
> > <mathewsyriac at gmail.com> wrote:
> >>
> >> On Wed, Oct 26, 2016 at 1:46 PM, Stephan Hoyer <shoyer at gmail.com>
> wrote:
> >>>
> >>> I wonder if the goals of this addition could be achieved by simply
> adding
> >>> an optional `cov` argument
> >>>
> >>> to np.corr, which would provide a pre-computed covariance.
> >>
> >>
> >> That's a fair suggestion which I'm happy to switch to. This eliminates
> the
> >> need for two new functions.
> >> I'll add an optional `cov = False` argument to numpy.corrcoef that
> returns
> >> a tuple (corr, cov) instead.
> >>
> >>>
> >>>
> >>> Either way, `covcorr` feels like a helper function that could exist in
> >>> user code rather than numpy proper.
> >>
> >>
> >> The user would have to re-implement the part that converts the
> covariance
> >> matrix to a correlation
> >> coefficient. I made this PR to avoid that code duplication.
> >
> >
> > With the API I was envisioning (or even your proposed API, for that
> matter),
> > this function would only be a few lines, e.g.,
> >
> > def covcorr(x):
> >     cov = np.cov(x)
> >     corr = np.corrcoef(x, cov=cov)
>
> IIUC, if you have a covariance matrix then you can compute the
> correlation matrix directly, without looking at 'x', so corrcoef(x,
> cov=cov) is a bit odd-looking. I think probably the API that makes the
> most sense is just to expose something like the covtocorr function
> (maybe it could have a less telegraphic name?)? And then, yeah, users
> can use that to build their own covcorr or whatever if they want it.
>

Right, agreed, this is why I said `x` becomes redundant when `cov` is
specified
when calling `numpy.corrcoef`.  So we have two alternatives:

1) Have `np.corrcoef` accept a boolean optional argument `covmat = False`
that lets
one obtain a tuple containing the covariance and the correlation matrices
in the same call
2) Modify my original PR so that `np.covtocorr` remains (with possibly a
better
name) but remove `np.covcorr` since this is easy for the user to add.

My preference is option 2.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20161026/f64089b7/attachment.html>


More information about the NumPy-Discussion mailing list