On Mon, Apr 28, 2008 at 12:29 AM, Stéfan van der Walt <stefan@sun.ac.za> wrote:
That should be 1.0/len(x), otherwise all the probabilities are 0.
Cheers Stéfan
I had >>>from __future__ import division above when I actually tested this, so they aren't zero (although you're right they would be otherwise), but you're right that they would be if this was run as-is. I figured the pmf part out, though based on Michael's examples - the pmf SHOULD be zero everywhere other then exactly at one of the values... when I generate the cdf, it has non-zero values. On Mon, Apr 28, 2008 at 2:02 AM, Michael <mnandris@btinternet.com> wrote:
There are at least 2 ways of using rv_discrete
e.g. 2 ways to calculate the next element of a simple Markov Chain with x(n+1)=Norm(0.5 x(n),1)
from scipy.stats import rv_discrete from numpy.random import multinomial
x = 3 n1 = stats.rv_continuous.rvs( stats.norm, 0.5*x, 1.0 )[0] print n1 n2 = stats.rv_discrete.rvs( stats.rv_discrete( name='sample', values=([0,1,2],[3/10.,5/10.,2/10.])), 0.5*x, 1.0 )[0] print n2 print sample = stats.rv_discrete( name='sample', values=([0,1,2],[3/10.,5/10.,2/10.]) ).rvs( size=10 ) print sample
The multinomial distribution from numpy.random is somewhat faster (40 times or so) but has a different idiom:
SIZE = 100000 VALUES = [0,1,2,3,4,5,6,7] PROBS = [1/8.,1/8.,1/8.,1/8.,1/8.,1/8.,1/8.,1/8.]
The idiom for rv_discrete is rv_discrete( name='sample', values=(VALUES,PROBS) )
The idiom for numpy.multinomial is different; if memory serves, you get frequencies as output instead of the actual values multinomial( SIZE, PROBS )
from numpy.random import multinomial multinomial(100,[ 0.2, 0.4, 0.1, 0.3 ]) array([12, 44, 10, 34]) multinomial( 100, [0.2, 0.0, 0.8, 0.0] ) <-- don't do this ... multinomial( 100, [0.2, 1e-16, 0.8, 1e-16] ) <-- or this multinomial( 100, [0.2-1e-16, 1e-16, 0.8-1e-16, 1e-16] ) <-- ok array([21, 0, 79, 0])
the last one is ok since the probability adds up to 1... painful, but it works
As explained above, I think I got the discrete to work (spurred on by your simpler example) - but what's the utility of using the multinomial idiom over the rv_discrete syntax? Is it faster?.
Continuous v's discrete: i found this in ./stats/scstats.py
from scipy import stats, r_ from pylab import show, plot import copy
# SOURCE: ./stats/scstats.py
SPREAD = 10
class cdf( object ): """ Baseclass for the task of determining a sequence of numbers {vi} which is distributed as a random variable X """ def integerDensityFunction( self ): """ Outputs an integer density function: xs (ints) and ys (probabilities) which are the correspondence between the whole numbers on the x axis to the probabilities on the y axis, according to a normal distribution. """ opt = [] for i in r_[-SPREAD:SPREAD:100j]: # 2-tailed test (?) opt.append(( i, stats.norm.cdf(i) )) # ( int, P(int) ) return zip(*opt) # [ (int...), (P...) ]
def display( self ): xs, ys = self.integerDensityFunction() plot( xs, ys ) show()
if __name__=='__main__': d = cdf() d.display()
I don't really understand what you're suggesting, but anyway, I don't see a stats/scstats.py file in the scipy directory (at least in 0.6)...
Continuous: i can only suggest using rv_continuous
stats.rv_continuous.rvs( stats.norm, 0.5*x, 1.0 ).whatever
.rvs( shape, loc, scale ) is the random variates .pdf( x, shape, loc, scale ) is the probability density function which, i think, is or should be genuinely continuous
My interpretation of this is that it is using the normal distribution - I want a distribution that is a smoothed/interpolated version of the discrete distribution I generated above. I take this to mean there's no built-in utility to do this, so I just have to make my own - this seems like a useful thing for data analysis, though, so I may submit it later to be added to SVN.