Re: [Numpy-discussion] Adding weights to cov and corrcoef (Sebastian Berg)
Date: Wed, 05 Mar 2014 17:45:47 +0100
From: Sebastian Berg <sebastian@sipsolutions.net> Subject: [Numpy-discussion] Adding weights to cov and corrcoef To: numpy-discussion@scipy.org Message-ID: <1394037947.21356.20.camel@sebastian-t440> Content-Type: text/plain; charset="UTF-8"
Hi all,
in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe suggested adding new parameters to our `cov` and `corrcoef` functions to implement weights, which already exists for `average` (the PR still needs to be adapted).
Do you mean adopted?
However, we may have missed something obvious, or maybe it is already getting too statistical for NumPy, or the keyword argument might be better `uncertainties` and `frequencies`. So comments and insights are very welcome :).
+1 for it being "too baroque" for NumPy--should go in SciPy (if it isn't already there): IMHO, NumPy should be kept as "lean and mean" as possible, embellishments are what SciPy is for. (Again, IMO.) DG
On Mi, 2014-03-05 at 10:21 -0800, David Goldsmith wrote:
Date: Wed, 05 Mar 2014 17:45:47 +0100 From: Sebastian Berg <sebastian@sipsolutions.net> Subject: [Numpy-discussion] Adding weights to cov and corrcoef To: numpy-discussion@scipy.org Message-ID: <1394037947.21356.20.camel@sebastian-t440> Content-Type: text/plain; charset="UTF-8"
Hi all,
in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe suggested adding new parameters to our `cov` and `corrcoef` functions to implement weights, which already exists for `average` (the PR still needs to be adapted).
Do you mean adopted?
What I meant was that the suggestion isn't actually implemented in the PR at this time. So you can't pull it in to try things out.
However, we may have missed something obvious, or maybe it is already getting too statistical for NumPy, or the keyword argument might be better `uncertainties` and `frequencies`. So comments and insights are very welcome :).
+1 for it being "too baroque" for NumPy--should go in SciPy (if it isn't already there): IMHO, NumPy should be kept as "lean and mean" as possible, embellishments are what SciPy is for. (Again, IMO.)
Well, on the other hand, scipy does not actually have a `std` function of its own, I think. So if it is quite useful I think this may be an option (I don't think I ever used weights with std, so I can't argue strongly for inclusion myself). Unless adding new functions to `scipy.stats` (or just statsmodels) which implement different types of weights is the longer term plan, then things might bite...
DG _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Thu, Mar 6, 2014 at 1:40 PM, Sebastian Berg <sebastian@sipsolutions.net>wrote:
On Mi, 2014-03-05 at 10:21 -0800, David Goldsmith wrote:
Date: Wed, 05 Mar 2014 17:45:47 +0100 From: Sebastian Berg <sebastian@sipsolutions.net> Subject: [Numpy-discussion] Adding weights to cov and corrcoef To: numpy-discussion@scipy.org Message-ID: <1394037947.21356.20.camel@sebastian-t440> Content-Type: text/plain; charset="UTF-8"
Hi all,
in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe suggested adding new parameters to our `cov` and `corrcoef` functions to implement weights, which already exists for `average` (the PR still needs to be adapted).
Do you mean adopted?
What I meant was that the suggestion isn't actually implemented in the PR at this time. So you can't pull it in to try things out.
However, we may have missed something obvious, or maybe it is already getting too statistical for NumPy, or the keyword argument might be better `uncertainties` and `frequencies`. So comments and insights are very welcome :).
+1 for it being "too baroque" for NumPy--should go in SciPy (if it isn't already there): IMHO, NumPy should be kept as "lean and mean" as possible, embellishments are what SciPy is for. (Again, IMO.)
Well, on the other hand, scipy does not actually have a `std` function of its own, I think. So if it is quite useful I think this may be an option (I don't think I ever used weights with std, so I can't argue strongly for inclusion myself). Unless adding new functions to `scipy.stats` (or just statsmodels) which implement different types of weights is the longer term plan, then things might bite...
AFAIK there's currently no such plan. Ralf
On Thu, Mar 6, 2014 at 3:49 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 6, 2014 at 1:40 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mi, 2014-03-05 at 10:21 -0800, David Goldsmith wrote:
Date: Wed, 05 Mar 2014 17:45:47 +0100 From: Sebastian Berg <sebastian@sipsolutions.net> Subject: [Numpy-discussion] Adding weights to cov and corrcoef To: numpy-discussion@scipy.org Message-ID: <1394037947.21356.20.camel@sebastian-t440> Content-Type: text/plain; charset="UTF-8"
Hi all,
in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe suggested adding new parameters to our `cov` and `corrcoef` functions to implement weights, which already exists for `average` (the PR still needs to be adapted).
Do you mean adopted?
What I meant was that the suggestion isn't actually implemented in the PR at this time. So you can't pull it in to try things out.
However, we may have missed something obvious, or maybe it is already getting too statistical for NumPy, or the keyword argument might be better `uncertainties` and `frequencies`. So comments and insights are very welcome :).
+1 for it being "too baroque" for NumPy--should go in SciPy (if it isn't already there): IMHO, NumPy should be kept as "lean and mean" as possible, embellishments are what SciPy is for. (Again, IMO.)
Well, on the other hand, scipy does not actually have a `std` function of its own, I think. So if it is quite useful I think this may be an option (I don't think I ever used weights with std, so I can't argue strongly for inclusion myself). Unless adding new functions to `scipy.stats` (or just statsmodels) which implement different types of weights is the longer term plan, then things might bite...
AFAIK there's currently no such plan.
since numpy has taken over all the basic statistics, var, std, cov, corrcoef, and scipy.stats dropped those, I don't see any reason to resurrect them. The only question IMO is which ddof for weighted std, ... statsmodels has the basic statistics with frequency weights, but they are largely in support of t-test and similar hypothesis tests. Josef
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Do, 2014-03-06 at 16:30 -0500, josef.pktd@gmail.com wrote:
On Thu, Mar 6, 2014 at 3:49 PM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Thu, Mar 6, 2014 at 1:40 PM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
On Mi, 2014-03-05 at 10:21 -0800, David Goldsmith wrote:
Date: Wed, 05 Mar 2014 17:45:47 +0100 From: Sebastian Berg <sebastian@sipsolutions.net> Subject: [Numpy-discussion] Adding weights to cov and corrcoef To: numpy-discussion@scipy.org Message-ID: <1394037947.21356.20.camel@sebastian-t440> Content-Type: text/plain; charset="UTF-8"
Hi all,
in Pull Request https://github.com/numpy/numpy/pull/3864 Neol Dawe suggested adding new parameters to our `cov` and `corrcoef` functions to implement weights, which already exists for `average` (the PR still needs to be adapted).
Do you mean adopted?
What I meant was that the suggestion isn't actually implemented in the PR at this time. So you can't pull it in to try things out.
However, we may have missed something obvious, or maybe it is already getting too statistical for NumPy, or the keyword argument might be better `uncertainties` and `frequencies`. So comments and insights are very welcome :).
+1 for it being "too baroque" for NumPy--should go in SciPy (if it isn't already there): IMHO, NumPy should be kept as "lean and mean" as possible, embellishments are what SciPy is for. (Again, IMO.)
Well, on the other hand, scipy does not actually have a `std` function of its own, I think. So if it is quite useful I think this may be an option (I don't think I ever used weights with std, so I can't argue strongly for inclusion myself). Unless adding new functions to `scipy.stats` (or just statsmodels) which implement different types of weights is the longer term plan, then things might bite...
AFAIK there's currently no such plan.
since numpy has taken over all the basic statistics, var, std, cov, corrcoef, and scipy.stats dropped those, I don't see any reason to resurrect them.
The only question IMO is which ddof for weighted std, ...
I am right now a bit unsure about whether or not the "weights" would be "aweights" or different... R seems to not care about the scale of the weights which seems a bit odd to me for an unbiased estimator? I always assumed that we can do the statistics behind using the ddof... But even if we can figure out the right way, what I am doubting a bit is that if we add weights, their names should be clear enough to not clash with possibly different kind of (interesting) weights in other functions.
statsmodels has the basic statistics with frequency weights, but they are largely in support of t-test and similar hypothesis tests.
Josef
Ralf
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Sebastian Berg <sebastian@sipsolutions.net> wrote:
I am right now a bit unsure about whether or not the "weights" would be "aweights" or different... R seems to not care about the scale of the weights which seems a bit odd to me for an unbiased estimator? I always assumed that we can do the statistics behind using the ddof... But even if we can figure out the right way, what I am doubting a bit is that if we add weights, their names should be clear enough to not clash with possibly different kind of (interesting) weights in other functions.
http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_covari...
On Thu, Mar 6, 2014 at 8:38 PM, Sturla Molden <sturla.molden@gmail.com> wrote:
Sebastian Berg <sebastian@sipsolutions.net> wrote:
I am right now a bit unsure about whether or not the "weights" would be "aweights" or different... R seems to not care about the scale of the weights which seems a bit odd to me for an unbiased estimator? I always assumed that we can do the statistics behind using the ddof... But even if we can figure out the right way, what I am doubting a bit is that if we add weights, their names should be clear enough to not clash with possibly different kind of (interesting) weights in other functions.
http://en.wikipedia.org/wiki/Weighted_arithmetic_mean#Weighted_sample_covari...
just as additional motivation (I'm not into definition of weights right now :) I was just reading a chapter on robust covariance estimation, and one of the steps in many of the procedures requires weighted covariances, and weighted variances. weights are just to reduce the influence of outlying observations. Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (5)
-
David Goldsmith
-
josef.pktd@gmail.com
-
Ralf Gommers
-
Sebastian Berg
-
Sturla Molden