Hello developers, I recently needed an implementation of the Kolmogorov-Smirnov 2 sample test which required the incorporation of a weight associated with each element of the data. This lead me to this stackexchange answer https://stats.stackexchange.com/questions/193439/two-sample-kolmogorov-smirn... where a procedure for a weighted 2-sample KS test is taken from Numerical Methods of Statistics by Monohan. My current implementation of this can be found here: https://github.com/brunel-physics/tact/blob/2b0ee2a28a30f014b103319118b64be5... Would there by any interest in incorporating this functionality into scipy? Yours, Corin Hoad
On Sat, Apr 21, 2018 at 7:42 PM, Corin Hoad <corinhoad@gmail.com> wrote:
Hello developers,
I recently needed an implementation of the Kolmogorov-Smirnov 2 sample test which required the incorporation of a weight associated with each element of the data. This lead me to this stackexchange answer https://stats.stackexchange.com/questions/193439/two- sample-kolmogorov-smirnov-test-with-weights where a procedure for a weighted 2-sample KS test is taken from Numerical Methods of Statistics by Monohan.
My current implementation of this can be found here:
https://github.com/brunel-physics/tact/blob/2b0ee2a28a30f014b103319118b64b e52070f001/tact/metrics.py#L198
Would there by any interest in incorporating this functionality into scipy?
I have potentially two problems: What's the definition or interpretation of the weights? Is there distribution of the test statistic correct? My guess is that it would change when weighting is introduced. I didn't find much in a brief Google search. Whether the distribution/p-value is correct could also be checked with the rejection probabilities in a simulation. Josef
Yours,
Corin Hoad
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
Just out of curiosity: What is the significance of the weights? If you are trying to represent the fact that distributional differences are more important in some regime than in another, e.g., you care more about the tails, then using weights is probably not the right approach. On Sat, Apr 21, 2018 at 4:42 PM, Corin Hoad <corinhoad@gmail.com> wrote:
Hello developers,
I recently needed an implementation of the Kolmogorov-Smirnov 2 sample test which required the incorporation of a weight associated with each element of the data. This lead me to this stackexchange answer https://stats.stackexchange.com/questions/193439/two- sample-kolmogorov-smirnov-test-with-weights where a procedure for a weighted 2-sample KS test is taken from Numerical Methods of Statistics by Monohan.
My current implementation of this can be found here:
https://github.com/brunel-physics/tact/blob/2b0ee2a28a30f014b103319118b64b e52070f001/tact/metrics.py#L198
Would there by any interest in incorporating this functionality into scipy?
Yours,
Corin Hoad
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
On Sat, Apr 21, 2018 at 11:15 PM, Phillip Feldman < phillip.m.feldman@gmail.com> wrote:
Just out of curiosity: What is the significance of the weights? If you are trying to represent the fact that distributional differences are more important in some regime than in another, e.g., you care more about the tails, then using weights is probably not the right approach.
I don't remember for sup tests like KS, but for integral tests like Anderson-Darling there are variations of the test that use different weights to emphasize different regions of the distribution, e.g. Cramer-Von Mises uses different weights than AD https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test I briefly skimmed parts of the Monahan chapter and that is specific for importance sampling weights. In that case choosing the weights should just compensate for the unequal sampling of points. So maybe in that case the distribution of the KS (or AD) test statistic might not change much. In either case, I think the distribution of the test statistic depends on the meaning or interpretation of the weights. Josef
On Sat, Apr 21, 2018 at 4:42 PM, Corin Hoad <corinhoad@gmail.com> wrote:
Hello developers,
I recently needed an implementation of the Kolmogorov-Smirnov 2 sample test which required the incorporation of a weight associated with each element of the data. This lead me to this stackexchange answer https://stats.stackexchange.com/questions/193439/two-sample- kolmogorov-smirnov-test-with-weights where a procedure for a weighted 2-sample KS test is taken from Numerical Methods of Statistics by Monohan.
My current implementation of this can be found here:
https://github.com/brunel-physics/tact/blob/2b0ee2a28a30f014 b103319118b64be52070f001/tact/metrics.py#L198
Would there by any interest in incorporating this functionality into scipy?
Yours,
Corin Hoad
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
What's the definition or interpretation of the weights?
Just out of curiosity: What is the significance of the weights? If you are
trying to represent the fact that distributional differences are more important in some regime than in another, e.g., you care more about the tails, then using weights is probably not the right approach
I briefly skimmed parts of the Monahan chapter and that is specific for
importance sampling weights. In that case choosing the weights should just compensate for the unequal sampling of points. So maybe in that case the distribution of the KS (or AD) test statistic might not change much.
The weights are sample weights, indicating the relative frequency of observations. My specific use case is in particle physics where studies often involve simulated data. Certain particle physics processes may have a large amount of simulated data available, but in reality we expect them to be very rare so sample weights are used to compensate. Corin On 22 April 2018 at 14:29, <josef.pktd@gmail.com> wrote:
On Sat, Apr 21, 2018 at 11:15 PM, Phillip Feldman < phillip.m.feldman@gmail.com> wrote:
Just out of curiosity: What is the significance of the weights? If you are trying to represent the fact that distributional differences are more important in some regime than in another, e.g., you care more about the tails, then using weights is probably not the right approach.
I don't remember for sup tests like KS, but for integral tests like Anderson-Darling there are variations of the test that use different weights to emphasize different regions of the distribution, e.g. Cramer-Von Mises uses different weights than AD https://en.wikipedia.org/wiki/Anderson%E2%80%93Darling_test
I briefly skimmed parts of the Monahan chapter and that is specific for importance sampling weights. In that case choosing the weights should just compensate for the unequal sampling of points. So maybe in that case the distribution of the KS (or AD) test statistic might not change much.
In either case, I think the distribution of the test statistic depends on the meaning or interpretation of the weights.
Josef
On Sat, Apr 21, 2018 at 4:42 PM, Corin Hoad <corinhoad@gmail.com> wrote:
Hello developers,
I recently needed an implementation of the Kolmogorov-Smirnov 2 sample test which required the incorporation of a weight associated with each element of the data. This lead me to this stackexchange answer https://stats.stackexchange.com/questions/193439/two-sample- kolmogorov-smirnov-test-with-weights where a procedure for a weighted 2-sample KS test is taken from Numerical Methods of Statistics by Monohan.
My current implementation of this can be found here:
https://github.com/brunel-physics/tact/blob/2b0ee2a28a30f014 b103319118b64be52070f001/tact/metrics.py#L198
Would there by any interest in incorporating this functionality into scipy?
Yours,
Corin Hoad
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@python.org https://mail.python.org/mailman/listinfo/scipy-dev
participants (3)
-
Corin Hoad
-
josef.pktd@gmail.com
-
Phillip Feldman