[Numpy-discussion] var bias reason?

Bruce Southey bsouthey at gmail.com
Wed Oct 15 12:26:18 EDT 2008


Hi,
While I disagree, I really do not care because this is documented. But 
perhaps a clear warning is need at the start so it clear what the 
default ddof means instead of it being buried in the Notes section.

Also I am surprised that you did not directly reference the Stein 
estimator (your minimum mean-squared estimator) and known effects in 
your paper:
http://en.wikipedia.org/wiki/James-Stein_estimator
So I did not find thiss any different from what is already known about 
the Stein estimator.

Bruce

PS While I may have gotten access via my University, I did get it from 
the link *Access this item. 
<https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf>
https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf
*
Travis E. Oliphant wrote:
> Gabriel Gellner wrote:
>   
>> Some colleagues noticed that var uses biased formula's by default in numpy,
>> searching for the reason only brought up:
>>
>> http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
>>
>> which I totally agree with, but there was no response? Any reason for this?
>>     
> I will try to respond to this as it was me who made the change.  I think 
> there have been responses, but I think I've preferred to stay quiet 
> rather than feed a flame war.   Ultimately, it is a matter of preference 
> and I don't think there would be equal weights given to all the 
> arguments surrounding the decision by everybody.
>
> I will attempt to articulate my reasons:  dividing by n is the maximum 
> likelihood estimator of variance and I prefer that justification more 
> than the "un-biased" justification for a default (especially given that 
> bias is just one part of the "error" in an estimator).    Having every 
> package that computes the mean return the "un-biased" estimate gives it 
> more cultural weight than than the concept deserves, I think.  Any 
> surprise that is created by the different default should be mitigated by 
> the fact that it's an opportunity to learn something about what you are 
> doing.    Here is a paper I wrote on the subject that you might find 
> useful:
>
> https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
> (Hopefully, they will resolve a link problem at the above site soon, but 
> you can read the abstract).
>
> I'm not trying to persuade anybody with this email (although if you can 
> download the paper at the above link, then I am trying to persuade with 
> that).  In this email I'm just trying to give context to the poster as I 
> think the question is legitimate.
>
> With that said, there is the ddof parameter so that you can change what 
> the divisor is.  I think that is a useful compromise.
>
> I'm unhappy with the internal inconsistency of cov, as I think it was an 
> oversight. I'd be happy to see cov changed as well to use the ddof 
> argument instead of the bias keyword, but that is an API change and 
> requires some transition discussion and work.
>
> The only other argument I've heard against the current situation is 
> "unit testing" with MATLAB or R code.   Just use ddof=1 when comparing 
> against MATLAB and R code is my suggestion.
>
> Best regards,
>
> -Travis
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>   

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20081015/c975e925/attachment.html>


More information about the NumPy-Discussion mailing list