[Numpy-discussion] Coverting ranks to a Gaussian

Tue Jun 10 03:56:10 EDT 2008

2008/6/9 Keith Goodman <kwgoodman at gmail.com>:
> Does anyone have a function that converts ranks into a Gaussian?
>
> I have an array x:
>
>>> import numpy as np
>>> x = np.random.rand(5)
>
> I rank it:
>
>>> x = x.argsort().argsort()
>>> x_ranked = x.argsort().argsort()
>>> x_ranked
>   array([3, 1, 4, 2, 0])
>
> I would like to convert the ranks to a Gaussian without using scipy.
> So instead of the equal distance between ranks in array x, I would
> like the distance been them to follow a Gaussian distribution.
>
> How far out in the tails of the Gaussian should 0 and N-1 (N=5 in the
> example above) be? Ideally, or arbitrarily, the areas under the
> Gaussian to the left of 0 (and the right of N-1) should be 1/N or
> 1/2N. Something like that. Or a fixed value is good too.

I'm actually not clear on what you need.

If what you need is for rank i of N to be the 100*i/N th percentile in
a Gaussian distribution, then you should indeed use scipy's functions
to accomplish that; I'd use scipy.stats.norm.ppf().

Of course, if your points were drawn from a Gaussian distribution,
they wouldn't be exactly 1/N apart, there would be some distribution.
Quite what the distribution of (say) the maximum or the median of N
points drawn from a Gaussian is, I can't say, though people have
looked at it. But if you want "typical" values, just generate N points
from a Gaussian and sort them:

V = np.random.randn(N)
V = np.sort(V)

return V[ranks]

Of course they will be different every time, but the distribution will be right.

Anne
P.S. why the "no scipy" restriction? it's a bit unreasonable. -A