[Numpy-discussion] var bias reason?

Gabriel Gellner ggellner at uoguelph.ca
Wed Oct 15 12:09:03 EDT 2008


On Wed, Oct 15, 2008 at 09:45:39AM -0500, Travis E. Oliphant wrote:
> Gabriel Gellner wrote:
> > Some colleagues noticed that var uses biased formula's by default in numpy,
> > searching for the reason only brought up:
> >
> > http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
> >
> > which I totally agree with, but there was no response? Any reason for this?
> I will try to respond to this as it was me who made the change.  I think 
> there have been responses, but I think I've preferred to stay quiet 
> rather than feed a flame war.   Ultimately, it is a matter of preference 
> and I don't think there would be equal weights given to all the 
> arguments surrounding the decision by everybody.
> 
> I will attempt to articulate my reasons:  dividing by n is the maximum 
> likelihood estimator of variance and I prefer that justification more 
> than the "un-biased" justification for a default (especially given that 
> bias is just one part of the "error" in an estimator).    Having every 
> package that computes the mean return the "un-biased" estimate gives it 
> more cultural weight than than the concept deserves, I think.  Any 
> surprise that is created by the different default should be mitigated by 
> the fact that it's an opportunity to learn something about what you are 
> doing.    Here is a paper I wrote on the subject that you might find 
> useful:
> 
> https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
> (Hopefully, they will resolve a link problem at the above site soon, but 
> you can read the abstract).
> 
Thanks for the reply, I look forward to reading the paper when it is
available. The major issue in my mind is not the technical issue but the
surprise factor. I can't think of single other package that uses this as the
default, and since it is also a method of ndarray (which is a built in type
and can't be monkey patched) there is no way of taking a different view (that
is supplying my on function) without the confusion I am feeling in my own lab
. . . (less technical people need to understand that they shouldn't
use a method of the same name) 

I worry about having numpy take this unpopular stance (as far as packages go)
simply to fight the good fight, as a built in method/behaviour of any ndarray,
rather than as a built in function, which presents no such problem, as it
allows dissent over a clearly muddy issue.

Sorry for the noise, and I am happy to see their is a reason, but I can't help
but find this a wort for purely pedagogical reasons. 

Gabriel



More information about the NumPy-Discussion mailing list