Bug in numpy std, etc. with other data structures?
Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno. df = pandas.DataFrame(np.random.random((10,5))) np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype' np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444]) Though I don't think this would work with larry yet. Pull request: https://github.com/numpy/numpy/pull/160 Skipper
On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Note I've no real intention of making DataFrame fully ndarray-like-- but it's nice to be able to type: df.std(axis=0) df.std(axis=1) np.sqrt(df) etc. which works the same as ndarray. I suppose the __array__/__array_wrap__ interface is there largely as a convenience.
On Sat, Sep 17, 2011 at 5:12 PM, Wes McKinney
On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
wrote: Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Note I've no real intention of making DataFrame fully ndarray-like-- but it's nice to be able to type:
df.std(axis=0) df.std(axis=1) np.sqrt(df)
etc. which works the same as ndarray. I suppose the __array__/__array_wrap__ interface is there largely as a convenience.
I'm a bit worried about the different ddof defaults in cases like this. Essentially we will not be able to rely on ddof=0 anymore. Different defaults on axis are easy to catch, but having the same function call return sometimes ddof=0 and sometimes ddof=1 might make for some fun debugging. Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Sep 17, 2011 at 8:36 PM,
On Sat, Sep 17, 2011 at 5:12 PM, Wes McKinney
wrote: On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
wrote: Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Note I've no real intention of making DataFrame fully ndarray-like-- but it's nice to be able to type:
df.std(axis=0) df.std(axis=1) np.sqrt(df)
etc. which works the same as ndarray. I suppose the __array__/__array_wrap__ interface is there largely as a convenience.
I'm a bit worried about the different ddof defaults in cases like this. Essentially we will not be able to rely on ddof=0 anymore. Different defaults on axis are easy to catch, but having the same function call return sometimes ddof=0 and sometimes ddof=1 might make for some fun debugging.
Josef
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Can we lobby for default ddof=1 in NumPy 2.0? Breaking with a convention like this doesn't make much sense to me.
On Sat, Sep 17, 2011 at 4:12 PM, Wes McKinney
On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
wrote: Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
numpy.std() does accepts array-like which obvious means that np.std([1,2,3,5]) works making asanyarray call a total waste of cpu time. Clearly pandas is not array-like input (as Wes points out below) so an error is correct. Doing this type of 'fix' will have unintended consequences when other non-numpy objects are incorrectly passed to numpy functions. Rather you should determine why 'array-like' failed here IF you think a pandas object is either array-like or a numpy object.
Note I've no real intention of making DataFrame fully ndarray-like-- but it's nice to be able to type:
df.std(axis=0) df.std(axis=1) np.sqrt(df)
etc. which works the same as ndarray. I suppose the __array__/__array_wrap__ interface is there largely as a convenience. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I consider that the only way pandas or any other numpy-derivative to overcome this is get into numpy/scipy. After all Travis opened the discussion for Numpy 3 which you could still address. Bruce PS Good luck on the ddof thing given the past discussions on it!
On Sat, Sep 17, 2011 at 21:50, Bruce Southey
numpy.std() does accepts array-like which obvious means that np.std([1,2,3,5]) works making asanyarray call a total waste of cpu time. Clearly pandas is not array-like input (as Wes points out below) so an error is correct.
No. Even lists are "array-like" in the terminology of the docstring standard. Anything that np.asarray() or np.asanyarray() can accept is "array-like". Please stop making things up and being sanctimonious about it. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
On Sat, Sep 17, 2011 at 10:50 PM, Bruce Southey
On Sat, Sep 17, 2011 at 4:12 PM, Wes McKinney
wrote: On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
wrote: Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
numpy.std() does accepts array-like which obvious means that np.std([1,2,3,5]) works making asanyarray call a total waste of cpu time. Clearly pandas is not array-like input (as Wes points out below) so an error is correct. Doing this type of 'fix' will have unintended consequences when other non-numpy objects are incorrectly passed to numpy functions. Rather you should determine why 'array-like' failed here IF you think a pandas object is either array-like or a numpy object.
No, the reason it is failing is because np.std takes the EAFP/duck-typing approach: try: std = a.std except AttributeError: return _wrapit(a, 'std', axis, dtype, out, ddof) return std(axis, dtype, out, ddof) Indeed DataFrame has an std method but it doesn't have the same function signature as ndarray.std.
Note I've no real intention of making DataFrame fully ndarray-like-- but it's nice to be able to type:
df.std(axis=0) df.std(axis=1) np.sqrt(df)
etc. which works the same as ndarray. I suppose the __array__/__array_wrap__ interface is there largely as a convenience. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I consider that the only way pandas or any other numpy-derivative to overcome this is get into numpy/scipy. After all Travis opened the discussion for Numpy 3 which you could still address.
Bruce PS Good luck on the ddof thing given the past discussions on it! _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Sep 17, 2011 at 10:00 PM, Wes McKinney
On Sat, Sep 17, 2011 at 10:50 PM, Bruce Southey
wrote: On Sat, Sep 17, 2011 at 4:12 PM, Wes McKinney
wrote: On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
wrote: Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
numpy.std() does accepts array-like which obvious means that np.std([1,2,3,5]) works making asanyarray call a total waste of cpu time. Clearly pandas is not array-like input (as Wes points out below) so an error is correct. Doing this type of 'fix' will have unintended consequences when other non-numpy objects are incorrectly passed to numpy functions. Rather you should determine why 'array-like' failed here IF you think a pandas object is either array-like or a numpy object.
No, the reason it is failing is because np.std takes the EAFP/duck-typing approach:
try: std = a.std except AttributeError: return _wrapit(a, 'std', axis, dtype, out, ddof) return std(axis, dtype, out, ddof)
Indeed DataFrame has an std method but it doesn't have the same function signature as ndarray.std.
Thanks for the clarification - see Robert I am not making things up! Bruce
Note I've no real intention of making DataFrame fully ndarray-like-- but it's nice to be able to type:
df.std(axis=0) df.std(axis=1) np.sqrt(df)
etc. which works the same as ndarray. I suppose the __array__/__array_wrap__ interface is there largely as a convenience. _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
I consider that the only way pandas or any other numpy-derivative to overcome this is get into numpy/scipy. After all Travis opened the discussion for Numpy 3 which you could still address.
Bruce PS Good luck on the ddof thing given the past discussions on it! _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
On Sat, Sep 17, 2011 at 22:11, Bruce Southey
On Sat, Sep 17, 2011 at 10:00 PM, Wes McKinney
wrote: On Sat, Sep 17, 2011 at 10:50 PM, Bruce Southey
wrote: On Sat, Sep 17, 2011 at 4:12 PM, Wes McKinney
wrote: On Sat, Sep 17, 2011 at 4:48 PM, Skipper Seabold
wrote: Just ran into this. Any objections for having numpy.std and other functions in core/fromnumeric.py call asanyarray before trying to use the array's method? Other data structures like pandas and larry define their own std method, for instance, and this doesn't allow them to pass through. I'm inclined to say that the issue is with numpy, though maybe the data structures shouldn't shadow numpy array methods while altering the signature. I dunno.
df = pandas.DataFrame(np.random.random((10,5)))
np.std(df,axis=0) <snip> TypeError: std() got an unexpected keyword argument 'dtype'
np.std(np.asanyarray(df),axis=0) array([ 0.30883352, 0.3133324 , 0.26517361, 0.26389029, 0.20022444])
Though I don't think this would work with larry yet.
Pull request: https://github.com/numpy/numpy/pull/160
Skipper _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
numpy.std() does accepts array-like which obvious means that np.std([1,2,3,5]) works making asanyarray call a total waste of cpu time. Clearly pandas is not array-like input (as Wes points out below) so an error is correct. Doing this type of 'fix' will have unintended consequences when other non-numpy objects are incorrectly passed to numpy functions. Rather you should determine why 'array-like' failed here IF you think a pandas object is either array-like or a numpy object.
No, the reason it is failing is because np.std takes the EAFP/duck-typing approach:
try: std = a.std except AttributeError: return _wrapit(a, 'std', axis, dtype, out, ddof) return std(axis, dtype, out, ddof)
Indeed DataFrame has an std method but it doesn't have the same function signature as ndarray.std.
Thanks for the clarification - see Robert I am not making things up!
I have no doubt that np.std() fails to work as desired. But the fault is with np.std() not living up to the semantics implied by the documentation (or the documentation documenting the wrong semantics), not that DataFrame does not live up to a meaning of "array-like" that you invented. -- Robert Kern "I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth." -- Umberto Eco
participants (5)
-
Bruce Southey
-
josef.pktd@gmail.com
-
Robert Kern
-
Skipper Seabold
-
Wes McKinney