Getting 95%/99% margin of ndarray
Hello list, is there some possibilty to get a p-dynamic of an array, i.e. if p=1 then the result would be (arr.min(), arr.max()), but if 0 < p < 1, then the result is so that the pth percentile of the picture is withing the range given? I cannot explain this very well, so please let me illustrate. Let's say we have an array: 0 0 0 0 1 2 3 4 5 2000 p = 0.9 -> (0, 5) This means that 90% of the pixels are within the [0; 5] range p = 0.5 -> (1, 5) p = 0.3 -> (2, 4) It it clear what I want? Is there the possibility to achieve that? If I have to implement it myself, how can I do that efficiently in Python? Kind regards, Johannes
You can do it "by hand" by sorting the array and taking the corresponding elements or you can use scipy.stats.scoreatpercentile that also interpolates. Best, Luca
On Jul 22, 2009, at 12:36 PM, Johannes Bauer wrote:
Hello list,
is there some possibilty to get a p-dynamic of an array, i.e. if p=1 then the result would be (arr.min(), arr.max()), but if 0 < p < 1, then the result is so that the pth percentile of the picture is withing the range given?
You could try scipy.stats.scoreatpercentile, scipy.stats.mstats.plottingposition or scipy.stats.mstats.mquantiles, which will all approximate quantiles of your distribution. To get the 90% of data you want, find the (0.05, 0.95) quantiles. More generally, to get n% data, take the (n/2)% and (1.-n/2)% quantiles... Even more approximately, you could sort your data and take the (n/2)N and (1-n/2)N ones, where n is the quantile you want and N the size of your array.
2009/7/22 Pierre GM <pgmdevlist@gmail.com>: You could try scipy.stats.scoreatpercentile, scipy.stats.mstats.plottingposition or scipy.stats.mstats.mquantiles, which will all approximate quantiles of your distribution.
It seems that mquantiles doesn't do what you'd expect when the limit keyword argument is specified. There's a patch for review here: http://codereview.appspot.com/97077 Cheers, Scott
On Jul 23, 2009, at 6:07 AM, Scott Sinclair wrote:
2009/7/22 Pierre GM <pgmdevlist@gmail.com>: You could try scipy.stats.scoreatpercentile, scipy.stats.mstats.plottingposition or scipy.stats.mstats.mquantiles, which will all approximate quantiles of your distribution.
It seems that mquantiles doesn't do what you'd expect when the limit keyword argument is specified. There's a patch for review here:
Thx for the patch, I'll port it in the next few hours. However, I disagree with the last few lines (where the quantiles are transformed to a standard ndarray if the mask is nomask. For consistency, we should always have a MaskedArray, don't you think ? (And anyway, taking a view as a ndarray is faster than using np.asarray...) Thx again P.
2009/7/23 Pierre GM <pgmdevlist@gmail.com>:
On Jul 23, 2009, at 6:07 AM, Scott Sinclair wrote:
2009/7/22 Pierre GM <pgmdevlist@gmail.com>: You could try scipy.stats.scoreatpercentile, scipy.stats.mstats.plottingposition or scipy.stats.mstats.mquantiles, which will all approximate quantiles of your distribution.
It seems that mquantiles doesn't do what you'd expect when the limit keyword argument is specified. There's a patch for review here:
Thx for the patch, I'll port it in the next few hours. However, I disagree with the last few lines (where the quantiles are transformed to a standard ndarray if the mask is nomask. For consistency, we should always have a MaskedArray, don't you think ? (And anyway, taking a view as a ndarray is faster than using np.asarray...)
Agree it's more consistent to always return a MaskedArray. I don't remember why I chose to return an ndarray. I think that it was probably to do with the fact that an ndarray is returned when 'axis' isn't specified...
import numpy as np import scipy as sp sp.__version__ '0.8.0.dev5874' from scipy.stats.mstats import mquantiles a = np.array([6., 47., 49., 15., 42., 41., 7., 39., 43., 40., 36.]) type(mquantiles(a)) <type 'numpy.ndarray'> type(mquantiles(np.ma.masked_array(a))) <type 'numpy.ndarray'> type(mquantiles(a, axis=0)) <class 'numpy.ma.core.MaskedArray'>
This could be fixed by forcing _quantiles1D() to always return a MaskedArray. Cheers, Scott
I am afraid I misunderstand your question because I do not get the results you expected. def pdyn(a, p): a = np.sort(a) n = round((1-p) * len(a)) return a[int((n+1)/2)], a[len(a)-1-int(n/2)] # a[-int(n/2)] would not work if n<=1
pdyn([0, 0, 0, 0, 1, 2, 3, 4, 5, 2000], 1) (0, 2000) pdyn([0, 0, 0, 0, 1, 2, 3, 4, 5, 2000], .9) (0, 2000) pdyn([0, 0, 0, 0, 1, 2, 3, 4, 5, 2000], .5) (0, 4) pdyn([0, 0, 0, 0, 1, 2, 3, 4, 5, 2000], .3) (1, 3)
If you have the array 0 0 0 0 1 2 3 4 5 2000 why should p = 0.5 -> (1, 5) ? I mean 10*.5 is 5, so you throw away 5 elements from the upper and lower tail (either 3+2 or 2+3) and you should end up with either (0, 4) or (0, 3), so why (1, 5) ? Best, Luca
participants (4)
-
Citi, Luca
-
Johannes Bauer
-
Pierre GM
-
Scott Sinclair