Re: [SciPy-Dev] Scipy 1.0 roadmap

22 Sep 2013

      On Sun, Sep 22, 2013 at 1:52 AM, Ralf Gommers <ralf.gommers@gmail.com> wrote:
...
On Sun, Sep 22, 2013 at 7:52 AM, Christopher Jordan-Squire <cjordan1@uw.edu>
wrote:
...
For scipy stats, is there anything on the table regarding somehow
unifying the sampling in numpy.random and the distributions in
scipy.stats? I'm specifically thinking of two issues:
(1) There's a lot of duplication between numpy.random and scipy.stats
but with different interfaces. This seems like something that ideally
would be reduced.
numpy.random only provides sampling and only has about half the
distributions of scipy.stats. Sampling is really only a small part of what
scipy.stats provides (pdf, cdf, moments, fitting a distribution, etc.). So
I'm not bothered by that duplication. If we'd want to reduce it I think it
would have to be removed from numpy, which doesn't sound like a good idea.
Yeah, I'm not about to suggest removing sampling from numpy.random.
That'd be crazy.

There's still the API mismatch between the names. Numpy.random, as a
rule, uses the full expansion of the name while scipy.stats, as a
rule, tends to abbreviate. That often confuses me, but not as
confusing as I first thought, since at least it's consistent.
...
...
(2) The interface for the distributions in scipy.stats seems to
explicitly be for scalar random variables, so there's no multivariate
normals, multinomials, dirichlet, wishart, etc.. Instead the sampling
is in numpy.random, and pdf's aren't there.
Two days ago PR-2726 was merged, which adds a multivariate normal
distribution. Others can be added. IIRC there has been an enhancement ticket
for wishhart somewhere and there's a Python implementation floating around
somewhere.
A multivariate normal is a great addition.

Currently, dirichlet and multinomial are the only random variables you
can sample from in numpy.random that aren't in scipy.stats. My $0.02
for scipy 1.0 roadmap is adding dirichlet and multinomial to
scipy.stats as well as wishart/inverse-wishart. Then distributions in
scipy.stats would be a superset of numpy.random, and scipy.stats would
include one of the most widely used distributions currently not in it.
(In addition to the implementations floating around, both scikit-learn
and pymc include bits and pieces of wishart-related code.)

Also, right now you can use scipy.stats.rv_discrete to create your own
discrete random variable,  but only for an array of integers--so
[1,2,3] rather than ['apple', 'orange', 'banana']. Which is fine, but
that also means a lot of code duplication/wrapper classes for everyone
who wants their random variable to be over a space of fruits rather
than integers. Not sure how many people that effects, though.

Not sure if these belong on the roadmap or just as enhancement requests.

Thanks,
Chris
...
Cheers,
Ralf
...
Has this been discussed elsewhere?
On Sat, Sep 21, 2013 at 8:03 PM, Blake Griffith
<blake.a.griffith@gmail.com> wrote:
...
...
sparse
``````
Don't emulate np.matrix behavior, drop 2-D?
What is meant by this? Emulate np.array instead?
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev
_______________________________________________
SciPy-Dev mailing list
SciPy-Dev@scipy.org
http://mail.scipy.org/mailman/listinfo/scipy-dev