[SciPy-dev] Statistics Review progress

Wed Apr 12 13:04:20 EDT 2006

Ed Schofield wrote:
> Robert Kern wrote:

>>* Some of the functions like mean() and std() are replications of functionality
>>in numpy and even the methods of array objects themselves. I would like to
>>remove them, but I imagine they are being used in various places. There's a
>>certain amount of code breakage I'm willing to accept in order to clean up
>>stats.py (e.g. all of my other bullet items), but this seems just gratuitous. 
> 
> I think we should remove the duplicated functions mean, std, and var
> from stats.  The corresponding functions are currently imported from
> numpy into the stats namespace anyway.

Well, not for long.

  http://projects.scipy.org/scipy/scipy/ticket/192

But we could keep std(), var(), mean(), and median() in mind and import them
specifically. However, numpy.median() will have to grow an axis argument.

>>* We really need to sort out the issue of biased and unbiased estimators. At
>>least, a number of scipy.stats functions compute values that could be computed
>>in two different ways, conventionally given labels "biased" and "unbiased". Now
>>while there is some disagreement as to which is better (you get to guess which I
>>prefer), I think we should offer both.
>>
>>Normally, I try to follow the design principle that if the value of a keyword
>>argument is almost always given as a constant (e.g. bias=True rather than
>>bias=flag_set_somewhere_else_in_my_code), then the functionality should be
>>exposed as two separate functions. However, there are a lot of these functions
>>in scipy.stats, and I don't think we would be doing anyone any favors by
>>doubling the number of these functions. IMO, "practicality beats purity" in this
>>case. 
> 
> I'd argue strongly that var and std should be identical to the functions
> in numpy.  If we want this we'd need separate functions like varbiased.
> 
> I don't really see the benefit of a 'bias' flag. 

Well, you snipped the use-case I gave.

> If we do encounter
> some real problems in handling the biased estimators consistently
> without it, we might as well argue for modifying the corresponding
> functions in numpy. 

Yes. I do in fact argue for that.

> But it'd be trivial to write
> 
> def my_var_function_with_bias_flag(a, bias=True):
>     if bias:
>         return varbiased(a)
>     else:
>         return var(a)
> 
> if this were ever necessary.

This is a bit backwards. I would implement varbiased() and var()

def varbiased(a):
  return var_with_flag(a, bias=True)

def var(a):
  return var_with_flag(a, bias=False)

I *don't* want three versions of each of these functions.

-- 
Robert Kern
robert.kern at gmail.com

"I have come to believe that the whole world is an enigma, a harmless enigma
 that is made terrible by our own mad attempt to interpret it as though it had
 an underlying truth."
  -- Umberto Eco