On Sun, Sep 22, 2013 at 1:52 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
On Sun, Sep 22, 2013 at 7:52 AM, Christopher Jordan-Squire <cjordan1@uw.edu> wrote:
For scipy stats, is there anything on the table regarding somehow unifying the sampling in numpy.random and the distributions in scipy.stats? I'm specifically thinking of two issues:
(1) There's a lot of duplication between numpy.random and scipy.stats but with different interfaces. This seems like something that ideally would be reduced.
numpy.random only provides sampling and only has about half the distributions of scipy.stats. Sampling is really only a small part of what scipy.stats provides (pdf, cdf, moments, fitting a distribution, etc.). So I'm not bothered by that duplication. If we'd want to reduce it I think it would have to be removed from numpy, which doesn't sound like a good idea.
Yeah, I'm not about to suggest removing sampling from numpy.random. That'd be crazy. There's still the API mismatch between the names. Numpy.random, as a rule, uses the full expansion of the name while scipy.stats, as a rule, tends to abbreviate. That often confuses me, but not as confusing as I first thought, since at least it's consistent.
(2) The interface for the distributions in scipy.stats seems to explicitly be for scalar random variables, so there's no multivariate normals, multinomials, dirichlet, wishart, etc.. Instead the sampling is in numpy.random, and pdf's aren't there.
Two days ago PR-2726 was merged, which adds a multivariate normal distribution. Others can be added. IIRC there has been an enhancement ticket for wishhart somewhere and there's a Python implementation floating around somewhere.
A multivariate normal is a great addition. Currently, dirichlet and multinomial are the only random variables you can sample from in numpy.random that aren't in scipy.stats. My $0.02 for scipy 1.0 roadmap is adding dirichlet and multinomial to scipy.stats as well as wishart/inverse-wishart. Then distributions in scipy.stats would be a superset of numpy.random, and scipy.stats would include one of the most widely used distributions currently not in it. (In addition to the implementations floating around, both scikit-learn and pymc include bits and pieces of wishart-related code.) Also, right now you can use scipy.stats.rv_discrete to create your own discrete random variable, but only for an array of integers--so [1,2,3] rather than ['apple', 'orange', 'banana']. Which is fine, but that also means a lot of code duplication/wrapper classes for everyone who wants their random variable to be over a space of fruits rather than integers. Not sure how many people that effects, though. Not sure if these belong on the roadmap or just as enhancement requests. Thanks, Chris
Cheers, Ralf
Has this been discussed elsewhere?
On Sat, Sep 21, 2013 at 8:03 PM, Blake Griffith <blake.a.griffith@gmail.com> wrote:
sparse ``````
Don't emulate np.matrix behavior, drop 2-D?
What is meant by this? Emulate np.array instead?
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________ SciPy-Dev mailing list SciPy-Dev@scipy.org http://mail.scipy.org/mailman/listinfo/scipy-dev