[SciPy-dev] RFR: Proposed fixes in scipy.stats functions for calculation of variance/error/etc.

Mon Oct 26 02:07:09 EDT 2009

On Mon, Oct 26, 2009 at 1:51 AM,  <josef.pktd at gmail.com> wrote:
> On Mon, Oct 26, 2009 at 1:31 AM, Ariel Rokem <arokem at berkeley.edu> wrote:
>> Hi Josef -
>>
>>>
>>> >From looking at the three function, I would assume that the combined
>>> function would have a signature like
>>>
>>> def zscore(a, compare=None, axis=0, ddof=0)
>>>
>>> or two functions, one with compare, one without ?
>>
>> Yes - I think that would be best. After all, someone wrote zmap with
>> some usecase in mind (I assume), so we would still want that
>> functionality to live on explicitly. So, I suggest (see attached diff)
>> to have two functions: one will be zscore and the other would be
>> zscore_compare. In the attached diff, I have decorated all these
>> functions with a deprecation warning and added these two new
>> functions, zscore (with the new, by-axis behavior. This makes more
>> sense to me, somehow) and zscore_compare.
>>
>>>
>>>
>>> About default axis=0:
>>>
>> ...
>>
>> Thanks for the explanation and for digging into the history of this. I
>> still think that in the long run it would be preferable to have these
>> things be internally consistent (that is consistent between numpy and
>> scipy), rather than consistent with other tools.
>>
>> Finally - I have tried to combine sem and stderr into one function,
>> under sem. Notice in particular the correction for ddof. My
>> understanding is that this should produce per default the result
>> std/sqrt(n-1), which is what we usually want for the sem. Is that
>> correct?
>
>
> Yes, I had to check the ttests, that's when I spend more time checking the
> degrees of freedom. It looks like the denominator needs one "n" and one
> "n-1"
>
>  v = np.var(a, axis, ddof=1)
>  t = d / np.sqrt(v/float(n))
>
> sem(a, ddof=1, axis=0) should have ddof as last argument to match np.var.
>
> your axis handling is still incorrect in zscore for 2d arrays
>
> if axis=1 then we need to add an axis
> a.mean(1)[:,None]
>
> there is a function in numpy to do this, expand_axis (?) that
> works for general axis. There was also a recent discussion
> on the numpy list for getting the axis back after a reduce.

(codereview is much easier to read than a diff wordpad)

for 2 arrays as in zscore_compare, you can also use _chk2_asarray
zscore_compare should match the axis argument of zscore, I think.

Cheers (and I'm off)

Josef

>
> Josef
>
>
>
>>
>>  Cheers,
>>
>> Ariel
>>
>> _______________________________________________
>> Scipy-dev mailing list
>> Scipy-dev at scipy.org
>> http://mail.scipy.org/mailman/listinfo/scipy-dev
>>
>>
>