[SciPy-dev] Standard deviations

Paul Barrett pebarrett at gmail.com
Tue Nov 29 16:48:14 EST 2005


I'd like to see more explicit method names. At first sight, 'a.var' and '
a.std' don't mean much to me, whereas 'a.variance' and 'a.standard_dev' do.

 -- Paul

On 11/29/05, Travis Oliphant <oliphant at ee.byu.edu> wrote:
>
> Ed Schofield wrote:
>
> >Hi all,
> >
> >I have three questions related to standard deviations and variances in
> >scipy.
> >
> >First, can someone explain the behaviour of array.std() without any
> >arguments?
> >
> > >>> a = arange(30).reshape(3,10)
> > >>> a
> >array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9],
> >       [10, 11, 12, 13, 14, 15, 16, 17, 18, 19],
> >       [20, 21, 22, 23, 24, 25, 26, 27, 28, 29]])
> > >>> a.std()
> >array([ 2.99856287,  2.85723522,  2.74647109,  2.67007684,  2.63104804,
> >        2.63104804,  2.67007684,  2.74647109,  2.85723522,  2.99856287])
> >
> >I don't understand what these numbers represent.  The correct standard
> >deviations of the column vectors are given by:
> >
> > >>> a.std(0)
> >array([ 10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.,  10.])
> >
> >and the standard deviations of the row vectors are:
> >
> > >>> a.std(1)
> >array([ 3.02765035,  3.02765035,  3.02765035])
> >
> >I would have expected a.std() to give the same output as
> > >>> a.ravel().std()
> >8.8034084308295046
> >
> >which is what a.mean() does.
> >
> >
>
> This is a bug.  Thanks for finding it.  I'll look into it.
>
> >
> >
> >Second, I'd like to point out that some of the functions in Lib/stats/
> >have a different convention to scipy core about whether operations are
> >performed row-wise or column-wise, and whether anyone would object to my
> >changing the stats functions to operate column-wise.  At the moment we
> >get this:
> >
> > >>> average(a)
> >array([ 10.,  11.,  12.,  13.,  14.,  15.,  16.,  17.,  18.,  19.])
> >
> >which is column-wise, but
> >
> > >>> std(a)
> >array([ 3.02765035,  3.02765035,  3.02765035])
> >
> >which is row-wise.  I presume the default behaviour of std() and friends
> >is just a historical relic.  If so we'd be wise to get this straight
> >well before a 1.0 release.
> >
> >
> Good catch.  It would be nice to have things as consistent as possible.
> Feel free to make consistency changes --- especially in stats.py  which
> is still messy.
>
> >Third, I'd like to request that we add an array.var() method to scipy
> >core to compute an array's sample variance.
> >
> >At the moment it seems that there is no way to compute the sample
> >variance of an array of numbers without installing the full scipy.
> >Users needing to do this will either have to roll their own function in
> >Python, like this:
> >
> >def var(A):
> >    m = len(A)
> >    return average((a-means)**2) * (m/(m-1.))
> >
> >or square the output of std().  Both are less efficient than a native
> >array.var() would be, requiring extra memory copying and, in the second
> >case, squaring the result of a square root operation, which also
> >introduces numerical imprecision.
> >
> >The extra code required is minimal.  There's an example patch below,
> >which works fine except that it inherits the weirdness of std().
> >
> >
> I'm O.K. with this.  Anybody else see a problem?
>
> -Travis
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-dev
>



--
Paul Barrett, PhD                   Johns Hopkins University
Assoc. Research Scientist     Dept of Physics and Astronomy
Phone: 410-516-5190            Baltimore, MD 21218
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/scipy-dev/attachments/20051129/09a0a02c/attachment.html>


More information about the SciPy-Dev mailing list