Adding statistical distances
Hi, I am a data scientist at Datadog, a cloud monitoring company. We have been working with statistical distances, which are distances between distributions, and more specifically on a family of distances that can be computed from CDFs, e.g., the first Wasserstein distance and the Cramér-von Mises distance. We wrote and optimized some code in Python to compute those distances. Since those distances have various applications, we think that it might be helpful to others and that is why we intend to share it. Here is the PR: https://github.com/scipy/scipy/pull/7563 I put the code in scipy.stats.stats as statistical distances share common features and applications with statistical tests (such as chisquare or ks_2samp) but let me know if that is not the appropriate place. Looking forward to hearing your feedback, Charles
Hi Charles, On Thu, Jul 6, 2017 at 2:40 AM, Charles-Philippe Masson < charles.masson@datadoghq.com> wrote:
Hi,
I am a data scientist at Datadog, a cloud monitoring company. We have been working with statistical distances, which are distances between distributions, and more specifically on a family of distances that can be computed from CDFs, e.g., the first Wasserstein distance and the Cramér-von Mises distance.
We wrote and optimized some code in Python to compute those distances. Since those distances have various applications, we think that it might be helpful to others and that is why we intend to share it. Here is the PR: https://github.com/scipy/scipy/pull/7563
Thanks for contributing! I put the code in scipy.stats.stats as statistical distances share common
features and applications with statistical tests (such as chisquare or ks_2samp) but let me know if that is not the appropriate place.
I had a look at the other possible place to put them, scipy.spatial.distance. While it could fit there as well - your function signatures fit with distance.cdist - I agree that putting statistical distances in scipy.stats makes more sense. The Kullback-Leibler divergence is also present in scipy.stats already (a bit hidden, it's in `entropy`). Cheers, Ralf
participants (2)
-
Charles-Philippe Masson
-
Ralf Gommers