[Numpy-discussion] Coverting ranks to a Gaussian

Mon Jun 9 22:06:24 EDT 2008

On Mon, Jun 9, 2008 at 4:45 PM, Robert Kern <robert.kern at gmail.com> wrote:
> On Mon, Jun 9, 2008 at 18:34, Keith Goodman <kwgoodman at gmail.com> wrote:
>> Does anyone have a function that converts ranks into a Gaussian?
>>
>> I have an array x:
>>
>>>> import numpy as np
>>>> x = np.random.rand(5)
>>
>> I rank it:
>>
>>>> x = x.argsort().argsort()
>>>> x_ranked = x.argsort().argsort()
>>>> x_ranked
>>   array([3, 1, 4, 2, 0])
>
> There are subtleties in computing ranks when ties are involved. Take a
> look at the implementation of scipy.stats.rankdata().

Good point. I had to deal with ties and missing data. I bet
scipy.stats.rankdata() is faster than my implementation.

>> I would like to convert the ranks to a Gaussian without using scipy.
>
> No dice. You are going to have to use scipy.special.ndtri somewhere. A
> basic transformation (off the top of my head, I have no idea if this
> is statistically meaningful):
>
>  scipy.special.ndtri((ranks + 1.0) / (len(ranks) + 1.0))
>
> Barring tied first or last items, this should give equal weight to
> each of the tails outside of the range of your data.

Nice. Thank you. It passes the never wrong chi-by-eye test:

r = np.arange(1000)
g = special.ndtri((r + 1.0) / (len(r) + 1.0))
pylab.hist(g, 50)
pylab.show()

I wasn't able to use scipy.special.ndtri (after import scipy) like you
did. I had to do (but I'm new to scipy)

from scipy import special
special.ndtri

scipy.__version__
   '0.6.0'

from Debian Lenny.