different percentile implementations ?
Hi, A quick question I've had in mind for some time but didn't find a solution : Is there a significant difference between "numpy.percentile" and "scipy.stats.scoreatpercentile" ? Of course the signatures are somewhat different, but I have the feeling that the overall purpose is the same. Am I missing something ? Best, Pierre
On Sun, Mar 25, 2012 at 6:30 PM, Pierre Haessig <pierre.haessig@crans.org> wrote:
Hi,
A quick question I've had in mind for some time but didn't find a solution : Is there a significant difference between "numpy.percentile" and "scipy.stats.scoreatpercentile" ?
Of course the signatures are somewhat different, but I have the feeling that the overall purpose is the same. Am I missing something ?
similar to std, var, histogram, ... some functions from scipy.stats are now in numpy. However, in contrast to std, var, I think scoreatpercentile should be enhanced and not removed (similar to histogram), for example my attempt: http://projects.scipy.org/scipy/ticket/1329 Josef
Best, Pierre
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Le 27/03/2012 18:56, josef.pktd@gmail.com a écrit :
similar to std, var, histogram, ... some functions from scipy.stats are now in numpy. Ok, historical reasons then. Fair enough. Would a "See also: numpy.percentile" make sense in stats.scoreatpercentile ? However, in contrast to std, var, I think scoreatpercentile should be enhanced and not removed (similar to histogram), for example my attempt: http://projects.scipy.org/scipy/ticket/1329
I'm not sure I completely understood what was involved in your ticket. The overall impression I felt is : * for a lot of statistical computations, it is not possible and/or desirable to have the same code for "regular array" and for "masked/nans/... arrays". * However, it would be possible to have the same api, that is : put all the entry points in scipy.stats instead of having scipy.stats.mstats as a separate api. Did I understand you correctly ? Best, Pierre
On Wed, Mar 28, 2012 at 5:44 AM, Pierre Haessig <pierre.haessig@crans.org> wrote:
Le 27/03/2012 18:56, josef.pktd@gmail.com a écrit :
similar to std, var, histogram, ... some functions from scipy.stats are now in numpy. Ok, historical reasons then. Fair enough. Would a "See also: numpy.percentile" make sense in stats.scoreatpercentile ?
of course, there are still many opportunities left to improve the scipy documentation
However, in contrast to std, var, I think scoreatpercentile should be enhanced and not removed (similar to histogram), for example my attempt: http://projects.scipy.org/scipy/ticket/1329
I'm not sure I completely understood what was involved in your ticket.
The main point was that scoreatpercentile/quantile in mstats or in climpy by Pierre GM has a lot more features that should be in a stats implementation.
The overall impression I felt is : * for a lot of statistical computations, it is not possible and/or desirable to have the same code for "regular array" and for "masked/nans/... arrays".
I think in most cases a pure ndarray implementation without NaNs or masks will be much faster, so I wouldn't just want to replace stats.stats by stats.mstats and keep fast paths.
* However, it would be possible to have the same api, that is : put all the entry points in scipy.stats instead of having scipy.stats.mstats as a separate api. Did I understand you correctly ?
What we should have, but is currently not the case, is that functions in stats.stats and stats.mstats have the same signature/API. Whether we can or should merge functions is still a bit open. In the scoreatpercentile case implementing the limit keyword (which is currently broken for 2d arrays) requires masking or something equivalent, so the easiest is to just use the mstats implementation. Similarly, the truncated statistic like tmean use masked arrays. Cheers, Josef
Best, Pierre
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
participants (2)
-
josef.pktd@gmail.com -
Pierre Haessig