bias in random.normalvariate??
drewlist at gmail.com
drewlist at gmail.com
Sat Aug 4 05:38:40 CEST 2007
I'm a Python newbie and certainly no expert on statistics, but my wife
was taking a statistics course this summer and to illustrate that
sampling random numbers from a distribution and taking an average of
the samples gives you a random number as the result (bigger sample ->
smaller variance in the calculated random number, converging in on the
mean of the original distribution), I threw together this program:
#! /usr/bin/python
import random;
i=1
samplen=100
mean=130
lo=mean
hi=mean
sd=10
sum=0
while(i<=samplen):
x=random.normalvariate(mean,sd)
#print x
if x<lo: lo=x
if x>hi: high=x
sum+=x
i+=1
print 'sample mean=', sum/samplen, '\n'
print 'low value =', lo
print 'high value=', high
---------------------------------------------------------
But the more I run the darn thing, the stranger the results look to
me.
random.normalvariate is defined on page 89 of
http://www-acc.kek.jp/WWW-ACC-exp/KEKB/Control/Python%20Documents/lib.pdf
as generating points from a normal distribution with mean and standard
deviation given by the arguments. But my test program consistently
comes up with sample means that are less than the mean of the
distribution. The lo value is consistently much lower relative to
the mean than the high value is higher than the mean. That is, it
looks to me like the normalvariate function is biased.
Part of my being a Python newbie is I'm not really sure where to go to
discuss this problem. If this group isn't the right place, do feel
free to point me to where I ought to go.
I'm running Ubuntu Dapper and "python -V" says I've got Python
2.4.3. I tried looking in random.py down under /usr/lib but find no
clues there as to the version of the random module on my machine. Am
I missing something?
/usr/lib/python2.4$ ls -l random.py
-rw-r--r-- 1 root root 30508 2006-10-06 04:34 random.py
I added the lo and high stuff to my test program out of fear that I
was running into something funky in adding up 100 floating point
numbers. That would be more of a worry if the sample size was much
bigger, but lo and high showed apparent bias quite aside from the
calculation of the mean.
Am I committing some other obvious statistical or Python blunder?
e.g. Am I mis-understanding what random.normalvariate is supposed to
do?
More information about the Python-list
mailing list