unexpected behaviour of numpy.var
Hello, numpy.var exhibits a rather dangereous behviour, as I have just noticed. In some cases, numpy.var calculates the variance, and in some cases the standard deviation (=square root of variance). Is this intended? I have to admit that I use numpy 0.9.6 at the moment. Has this been changed in more recent versions? Below a sample session Python 2.4.3 (#1, May 8 2006, 18:35:42) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-52)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy a = [1,2,3,4,5] numpy.var(a) 2.5 numpy.std(a) 1.5811388300841898 numpy.sqrt(2.5) 1.5811388300841898 a1 = numpy.array([[1],[2],[3],[4],[5]]) a1 array([[1], [2], [3], [4], [5]]) numpy.var(a1) array([ 1.58113883]) numpy.std(a1) array([ 1.58113883]) a =numpy.array([1,2,3,4,5]) numpy.std(a) 1.5811388300841898 numpy.var(a) 1.5811388300841898 numpy.__version__ '0.9.6'
Hanno -- Hanno Klemm klemm@phys.ethz.ch
Hi, Hanno. I ran your sample session in numpy 0.9.8 (on a Mac, just so you know; I don't yet have numpy installed on my Windows platform, and I don't have immediate access to a *nix box) and could not reproduce the problem, i.e., it does appear to have been fixed in 0.9.8. DG Hanno Klemm wrote:
Hello,
numpy.var exhibits a rather dangereous behviour, as I have just noticed. In some cases, numpy.var calculates the variance, and in some cases the standard deviation (=square root of variance). Is this intended? I have to admit that I use numpy 0.9.6 at the moment. Has this been changed in more recent versions?
Below a sample session
Python 2.4.3 (#1, May 8 2006, 18:35:42) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-52)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy a = [1,2,3,4,5] numpy.var(a)
2.5
numpy.std(a)
1.5811388300841898
numpy.sqrt(2.5)
1.5811388300841898
a1 = numpy.array([[1],[2],[3],[4],[5]]) a1
array([[1], [2], [3], [4], [5]])
numpy.var(a1)
array([ 1.58113883])
numpy.std(a1)
array([ 1.58113883])
a =numpy.array([1,2,3,4,5]) numpy.std(a)
1.5811388300841898
numpy.var(a)
1.5811388300841898
numpy.__version__
'0.9.6'
Hanno
-- HMRD/ORR/NOS/NOAA <http://response.restoration.noaa.gov/emergencyresponse/>
I also couldn't reproduce it on my 0.9.8 on Linux. DG On 8/1/06, David L Goldsmith <David.L.Goldsmith@noaa.gov> wrote:
Hi, Hanno. I ran your sample session in numpy 0.9.8 (on a Mac, just so you know; I don't yet have numpy installed on my Windows platform, and I don't have immediate access to a *nix box) and could not reproduce the problem, i.e., it does appear to have been fixed in 0.9.8.
DG
Hanno Klemm wrote:
Hello,
numpy.var exhibits a rather dangereous behviour, as I have just noticed. In some cases, numpy.var calculates the variance, and in some cases the standard deviation (=square root of variance). Is this intended? I have to admit that I use numpy 0.9.6 at the moment. Has this been changed in more recent versions?
Below a sample session
Python 2.4.3 (#1, May 8 2006, 18:35:42) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-52)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy a = [1,2,3,4,5] numpy.var(a)
2.5
numpy.std(a)
1.5811388300841898
numpy.sqrt(2.5)
1.5811388300841898
a1 = numpy.array([[1],[2],[3],[4],[5]]) a1
array([[1], [2], [3], [4], [5]])
numpy.var(a1)
array([ 1.58113883])
numpy.std(a1)
array([ 1.58113883])
a =numpy.array([1,2,3,4,5]) numpy.std(a)
1.5811388300841898
numpy.var(a)
1.5811388300841898
numpy.__version__
'0.9.6'
Hanno
-- HMRD/ORR/NOS/NOAA <http://response.restoration.noaa.gov/emergencyresponse/
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
-- David Grant http://www.davidgrant.ca
I cannot reproduce your results, but I wonder if the following is right:
a = array([1,2,3,4,5]) var(a[newaxis,:]) array([ 0., 0., 0., 0., 0.]) a[newaxis,:].var() 2.0 a[newaxis,:].var(axis=0) array([ 0., 0., 0., 0., 0.])
Are method and function supposed to have different defaults? It looks like the method defaults to variance over all axes while the function defaults to axis=0.
__version__ '1.0b2.dev2192'
On 8/1/06, Hanno Klemm <klemm@phys.ethz.ch> wrote:
Hello,
numpy.var exhibits a rather dangereous behviour, as I have just noticed. In some cases, numpy.var calculates the variance, and in some cases the standard deviation (=square root of variance). Is this intended? I have to admit that I use numpy 0.9.6 at the moment. Has this been changed in more recent versions?
Below a sample session
Python 2.4.3 (#1, May 8 2006, 18:35:42) [GCC 3.2.3 20030502 (Red Hat Linux 3.2.3-52)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy a = [1,2,3,4,5] numpy.var(a) 2.5 numpy.std(a) 1.5811388300841898 numpy.sqrt(2.5) 1.5811388300841898 a1 = numpy.array([[1],[2],[3],[4],[5]]) a1 array([[1], [2], [3], [4], [5]]) numpy.var(a1) array([ 1.58113883]) numpy.std(a1) array([ 1.58113883]) a =numpy.array([1,2,3,4,5]) numpy.std(a) 1.5811388300841898 numpy.var(a) 1.5811388300841898 numpy.__version__ '0.9.6'
Hanno
-- Hanno Klemm klemm@phys.ethz.ch
------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys -- and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Numpy-discussion mailing list Numpy-discussion@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/numpy-discussion
Sasha wrote:
I cannot reproduce your results, but I wonder if the following is right:
a = array([1,2,3,4,5]) var(a[newaxis,:])
array([ 0., 0., 0., 0., 0.])
a[newaxis,:].var()
2.0
a[newaxis,:].var(axis=0)
array([ 0., 0., 0., 0., 0.])
Are method and function supposed to have different defaults? It looks like the method defaults to variance over all axes while the function defaults to axis=0.
They are supposed to have different defaults because the functional forms are largely for backward compatibility where axis=0 was the default. -Travis
Torgil Svensson wrote:
They are supposed to have different defaults because the functional forms are largely for backward compatibility where axis=0 was the default.
-Travis
Isn't backwards compatibility what "oldnumeric" is for?
As this discussion indicates there has been a switch from numpy 0.9.8 to numpy 1.0b of how to handle backward compatibility. Instead of importing old names a new sub-package numpy.oldnumeric was created. This mechanism is incomplete in the sense that there are still some backward-compatible items in numpy such as defaults on the axis keyword for functions versus methods and you still have to make the changes that convertcode.py makes to the code to get it to work. I'm wondering about whether or not some additional effort should be placed in numpy.oldnumeric so that replacing Numeric with numpy.oldnumeric actually gives no compatibility issues (i.e. the only thing you have to change is replace imports with new names). In other words a simple array sub-class could be created that mimics the old Numeric array and the old functions could be created as well with the same arguments. The very same thing could be done with numarray. This would make conversion almost trivial. Then, the convertcode script could be improved to make all the changes that would take a oldnumeric-based module to a more modern numpy-based module. A similar numarray script could be developed as well. What do people think? Is it worth it? This could be a coding-sprint effort at SciPy. -Travis
On Tue, Aug 01, 2006 at 06:21:49PM -0600, Travis Oliphant wrote:
I'm wondering about whether or not some additional effort should be placed in numpy.oldnumeric so that replacing Numeric with numpy.oldnumeric actually gives no compatibility issues (i.e. the only thing you have to change is replace imports with new names). In other words a simple array sub-class could be created that mimics the old Numeric array and the old functions could be created as well with the same arguments.
The very same thing could be done with numarray. This would make conversion almost trivial.
Then, the convertcode script could be improved to make all the changes that would take a oldnumeric-based module to a more modern numpy-based module. A similar numarray script could be developed as well.
What do people think? Is it worth it? This could be a coding-sprint effort at SciPy.
This sounds like a very good idea to me. I hope that those of us who cannot attend SciPy 2006 can still take part in the coding sprints, be it via IRC or some other communications media. Cheers Stéfan
What do people think? Is it worth it? This could be a coding-sprint effort at SciPy.
-Travis
Sounds like a good idea. This should make old code work while not imposing unneccessary restrictions on numpy due to backward compatibility. //Torgil
Travis Oliphant wrote: > Torgil Svensson wrote: > >>> They are supposed to have different defaults because the functional >>> forms are largely for backward compatibility where axis=0 was the default. >>> >>> -Travis >>> >>> >> Isn't backwards compatibility what "oldnumeric" is for? >> >> >> > > As this discussion indicates there has been a switch from numpy 0.9.8 to > numpy 1.0b of how to handle backward compatibility. Instead of > importing old names a new sub-package numpy.oldnumeric was created. > This mechanism is incomplete in the sense that there are still some > backward-compatible items in numpy such as defaults on the axis keyword > for functions versus methods and you still have to make the changes that > convertcode.py makes to the code to get it to work. > > I'm wondering about whether or not some additional effort should be > placed in numpy.oldnumeric so that replacing Numeric with > numpy.oldnumeric actually gives no compatibility issues (i.e. the only > thing you have to change is replace imports with new names). In > other words a simple array sub-class could be created that mimics the > old Numeric array and the old functions could be created as well with > the same arguments. > > The very same thing could be done with numarray. This would make > conversion almost trivial. > > Then, the convertcode script could be improved to make all the changes > that would take a oldnumeric-based module to a more modern numpy-based > module. A similar numarray script could be developed as well. > > What do people think? Is it worth it? This could be a coding-sprint > effort at SciPy. > > > -Travis Hi, Just as thought of cautiousness: If people actually get "too much" encouraged to just always say " from numpy.oldnumeric import * " or as suggested maybe soon also something like " from numpy.oldnumarray import * " - could this not soon lead to a great state of confusion when later people on this mailing list ask questions and nobody really knows which of the submodules they are referring to !? Recently someone (Torgil Svensson) here suggested to unify the default argument between a method and a function - I think the discussion was about numpy.var and it's "axis" argument. I would be a clear +1 on unifying these and have a clean design of numpy. Consequently the old way of different defaults should be absorbed by the oldnumeric sub module. All I'm saying then is that this could cause confusion later on - and therefore the whole idea of "easy backwards compatibility" should be qualified by encouraging people to adopt the most problematic changes (like new default values) rather sooner than later. I'm hoping that numpy will find soon an increasingly broader acceptance in the whole Python community (and the entire scientific community for that matter ;-) ). Thanks for all your work, Sebastian Haase
On Wed, 02 Aug 2006, Sebastian Haase apparently wrote:
Recently someone (Torgil Svensson) here suggested to unify the default argument between a method and a function - I think the discussion was about numpy.var and it's "axis" argument. I would be a clear +1 on unifying these and have a clean design of numpy. Consequently the old way of different defaults should be absorbed by the oldnumeric sub module.
+1 I think this consistency is *really* important for the easy acceptance of numpy by new users. (For a user's perspective, I also think is is just good design.) I expect many new users to be "burned" by this inconsistency. However, as an intermediate run (say 1 year) transition measure to the consistent use, I would be comfortable with the numpy functions requiring an axis argument. One user's view, Alan Isaac
participants (9)
-
Alan G Isaac -
David Grant -
David L Goldsmith -
Hanno Klemm -
Sasha -
Sebastian Haase -
Stefan van der Walt -
Torgil Svensson -
Travis Oliphant