[Numpy-discussion] var bias reason?
Bruce Southey
bsouthey at gmail.com
Wed Oct 15 12:26:18 EDT 2008
Hi,
While I disagree, I really do not care because this is documented. But
perhaps a clear warning is need at the start so it clear what the
default ddof means instead of it being buried in the Notes section.
Also I am surprised that you did not directly reference the Stein
estimator (your minimum mean-squared estimator) and known effects in
your paper:
http://en.wikipedia.org/wiki/James-Stein_estimator
So I did not find thiss any different from what is already known about
the Stein estimator.
Bruce
PS While I may have gotten access via my University, I did get it from
the link *Access this item.
<https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf>
https://contentdm.lib.byu.edu/cgi-bin/showfile.exe?CISOROOT=/EER&CISOPTR=134&filename=135.pdf
*
Travis E. Oliphant wrote:
> Gabriel Gellner wrote:
>
>> Some colleagues noticed that var uses biased formula's by default in numpy,
>> searching for the reason only brought up:
>>
>> http://article.gmane.org/gmane.comp.python.numeric.general/12438/match=var+bias
>>
>> which I totally agree with, but there was no response? Any reason for this?
>>
> I will try to respond to this as it was me who made the change. I think
> there have been responses, but I think I've preferred to stay quiet
> rather than feed a flame war. Ultimately, it is a matter of preference
> and I don't think there would be equal weights given to all the
> arguments surrounding the decision by everybody.
>
> I will attempt to articulate my reasons: dividing by n is the maximum
> likelihood estimator of variance and I prefer that justification more
> than the "un-biased" justification for a default (especially given that
> bias is just one part of the "error" in an estimator). Having every
> package that computes the mean return the "un-biased" estimate gives it
> more cultural weight than than the concept deserves, I think. Any
> surprise that is created by the different default should be mitigated by
> the fact that it's an opportunity to learn something about what you are
> doing. Here is a paper I wrote on the subject that you might find
> useful:
>
> https://contentdm.lib.byu.edu/cdm4/item_viewer.php?CISOROOT=/EER&CISOPTR=134&CISOBOX=1&REC=1
> (Hopefully, they will resolve a link problem at the above site soon, but
> you can read the abstract).
>
> I'm not trying to persuade anybody with this email (although if you can
> download the paper at the above link, then I am trying to persuade with
> that). In this email I'm just trying to give context to the poster as I
> think the question is legitimate.
>
> With that said, there is the ddof parameter so that you can change what
> the divisor is. I think that is a useful compromise.
>
> I'm unhappy with the internal inconsistency of cov, as I think it was an
> oversight. I'd be happy to see cov changed as well to use the ddof
> argument instead of the bias keyword, but that is an API change and
> requires some transition discussion and work.
>
> The only other argument I've heard against the current situation is
> "unit testing" with MATLAB or R code. Just use ddof=1 when comparing
> against MATLAB and R code is my suggestion.
>
> Best regards,
>
> -Travis
>
> _______________________________________________
> Numpy-discussion mailing list
> Numpy-discussion at scipy.org
> http://projects.scipy.org/mailman/listinfo/numpy-discussion
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/numpy-discussion/attachments/20081015/c975e925/attachment.html>
More information about the NumPy-Discussion
mailing list