Hi All,
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
Thoughts?
Chuck
I wouldn't know of any valid output when applying the sign function to NaN. Therefore, I think it is correct to return a ValueError. Furthermore, I would prefer such an error over just returning NaN since it helps you locating where NaN is generated.
On Tue, Sep 29, 2015 at 5:13 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Hi All,
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
Thoughts?
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays. Why should it be different for object arrays?
Anne
P.S. If you want exceptions when NaNs appear, that's what np.seterr is for. -A
On Tue, Sep 29, 2015 at 5:18 PM Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
I wouldn't know of any valid output when applying the sign function to NaN. Therefore, I think it is correct to return a ValueError. Furthermore, I would prefer such an error over just returning NaN since it helps you locating where NaN is generated.
On Tue, Sep 29, 2015 at 5:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Hi All,
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
Thoughts?
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Sep 29, 2015 at 11:25 AM, Anne Archibald archibald@astron.nl wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays. Why should it be different for object arrays?
Anne
P.S. If you want exceptions when NaNs appear, that's what np.seterr is for. -A
I also think NaN should be treated the same way as floating point numbers (whatever that is). Otherwise it is difficult to remember when nan is essentially a float dtype or another dtype. (given that float is the smallest dtype that can hold a nan)
Josef
On Tue, Sep 29, 2015 at 5:18 PM Freddy Rietdijk freddyrietdijk@fridh.nl wrote:
I wouldn't know of any valid output when applying the sign function to NaN. Therefore, I think it is correct to return a ValueError. Furthermore, I would prefer such an error over just returning NaN since it helps you locating where NaN is generated.
On Tue, Sep 29, 2015 at 5:13 PM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Hi All,
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
Thoughts?
Chuck
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On 09/29/2015 11:39 AM, josef.pktd@gmail.com wrote:
On Tue, Sep 29, 2015 at 11:25 AM, Anne Archibald <archibald@astron.nl mailto:archibald@astron.nl> wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays. Why should it be different for object arrays? Anne P.S. If you want exceptions when NaNs appear, that's what np.seterr is for. -A
I also think NaN should be treated the same way as floating point numbers (whatever that is). Otherwise it is difficult to remember when nan is essentially a float dtype or another dtype. (given that float is the smallest dtype that can hold a nan)
Note that I've reimplemented np.sign for object arrays along these lines in this open PR: https://github.com/numpy/numpy/pull/6320
That PR recursively uses the np.sign ufunc to evaluate object arrays containing float and complex numbers. This way the behavior on object arrays is identical to float/complex arrays.
Here is what the np.sign ufunc does (for arbitrary x):
np.sign(np.nan) -> nan np.sign(complex(np.nan, x)) -> complex(nan, 0) np.sign(complex(x, np.nan)) -> complex(nan, 0)
Allan
On Tue, Sep 29, 2015 at 9:25 AM, Anne Archibald archibald@astron.nl wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays. Why should it be different for object arrays?
What about non-numeric objects in general ?
<snip>
Chuck
On Sep 29, 2015 8:25 AM, "Anne Archibald" archibald@astron.nl wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects don't have a sign, and the natural way to implement something like np.sign's semantics using only the "object" interface is
if obj < 0: return -1 elif obj > 0: return 1 elif obj == 0: return 0 else: raise
In general I'm not a big fan of trying to do all kinds of guessing about how to handle random objects in object arrays, the kind that ends up with a big chain of type checks and fallback behaviors. Pretty soon we find ourselves trying to extend the language with our own generic dispatch system for arbitrary python types, just for object arrays. (The current hack where for object arrays np.log will try calling obj.log() is particularly horrible. There is no rule in python that "log" is a reserved method name for "logarithm" on arbitrary objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.
OTOH sometimes practicality beats purity, and at least object arrays are already kinda cordoned off from the rest of the system, so I don't feel as strongly as if we were talking about core functionality.
...is there a compelling reason to even support np.sign on object arrays? This seems pretty far into the weeds, and that tends to lead to poor intuition and decision making.
-n
On Tue, Sep 29, 2015 at 12:16 PM, Nathaniel Smith njs@pobox.com wrote:
On Sep 29, 2015 8:25 AM, "Anne Archibald" archibald@astron.nl wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects don't have a sign, and the natural way to implement something like np.sign's semantics using only the "object" interface is
if obj < 0: return -1 elif obj > 0: return 1 elif obj == 0: return 0 else: raise
That is what current master does, using PyObject_RichCompareBool for the comparisons.
Chuck
On Tue, Sep 29, 2015 at 2:16 PM, Nathaniel Smith njs@pobox.com wrote:
On Sep 29, 2015 8:25 AM, "Anne Archibald" archibald@astron.nl wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point arrays.
Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects don't have a sign, and the natural way to implement something like np.sign's semantics using only the "object" interface is
if obj < 0: return -1 elif obj > 0: return 1 elif obj == 0: return 0 else: raise
In general I'm not a big fan of trying to do all kinds of guessing about how to handle random objects in object arrays, the kind that ends up with a big chain of type checks and fallback behaviors. Pretty soon we find ourselves trying to extend the language with our own generic dispatch system for arbitrary python types, just for object arrays. (The current hack where for object arrays np.log will try calling obj.log() is particularly horrible. There is no rule in python that "log" is a reserved method name for "logarithm" on arbitrary objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.
OTOH sometimes practicality beats purity, and at least object arrays are already kinda cordoned off from the rest of the system, so I don't feel as strongly as if we were talking about core functionality.
...is there a compelling reason to even support np.sign on object arrays? This seems pretty far into the weeds, and that tends to lead to poor intuition and decision making.
One of the usecases that has sneaked in during the last few numpy versions is that object arrays contain numerical arrays where the shapes don't add up to a rectangular array. In those cases being able to apply numerical operations might be useful.
But I'm +0 since I don't work with object arrays.
Josef
-n
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
One of the usecases that has sneaked in during the last few numpy versions is that object arrays contain numerical arrays where the shapes don't add up to a rectangular array.
I think that's the wrong way to dve that problem -- we really should have a "proper" ragged array implementation. But is is the easiest way at this point.
For this, and other use-cases, special casing Numpy arrays stored in object arrays does make sense:
"If this is s a Numpy array, pass the operation through."
-CHB
In those cases being able to apply numerical operations might be useful.
But I'm +0 since I don't work with object arrays.
Josef
-n
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Sep 29, 2015 at 6:58 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
One of the usecases that has sneaked in during the last few numpy versions is that object arrays contain numerical arrays where the shapes don't add up to a rectangular array.
I think that's the wrong way to dve that problem -- we really should have a "proper" ragged array implementation. But is is the easiest way at this point.
For this, and other use-cases, special casing Numpy arrays stored in object arrays does make sense:
"If this is s a Numpy array, pass the operation through."
Because we now (development) use rich compare, the result looks like
In [1]: a = ones(3)
In [2]: b = array([a, -a], object)
In [3]: b Out[3]: array([[1.0, 1.0, 1.0], [-1.0, -1.0, -1.0]], dtype=object)
In [4]: sign(b) Out[4]: array([[1L, 1L, 1L], [-1L, -1L, -1L]], dtype=object)
The function returns long integers in order to not special case Python 3. Hmm, wonder if we might want to change that.
Chuck
On Tue, Sep 29, 2015 at 7:31 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
On Tue, Sep 29, 2015 at 6:58 PM, Chris Barker - NOAA Federal < chris.barker@noaa.gov> wrote:
One of the usecases that has sneaked in during the last few numpy versions is that object arrays contain numerical arrays where the shapes don't add up to a rectangular array.
I think that's the wrong way to dve that problem -- we really should have a "proper" ragged array implementation. But is is the easiest way at this point.
For this, and other use-cases, special casing Numpy arrays stored in object arrays does make sense:
"If this is s a Numpy array, pass the operation through."
Because we now (development) use rich compare, the result looks like
In [1]: a = ones(3)
In [2]: b = array([a, -a], object)
In [3]: b Out[3]: array([[1.0, 1.0, 1.0], [-1.0, -1.0, -1.0]], dtype=object)
In [4]: sign(b) Out[4]: array([[1L, 1L, 1L], [-1L, -1L, -1L]], dtype=object)
The function returns long integers in order to not special case Python 3. Hmm, wonder if we might want to change that.
Oops, not what was intended. In fact it raises an error
In [7]: b Out[7]: array([array([ 1., 1., 1.]), array([-1., -1., -1.])], dtype=object)
In [8]: sign(b) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-8-3b1a81271d2e> in <module>() ----> 1 sign(b)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
Chuck
Chuck
On Tue, Sep 29, 2015 at 6:35 PM, Charles R Harris <charlesr.harris@gmail.com
wrote:
For this, and other use-cases, special casing Numpy arrays stored in
object arrays does make sense:
"If this is s a Numpy array, pass the operation through."
Because we now (development) use rich compare, the result looks like
Oops, not what was intended. In fact it raises an error
In [7]: b Out[7]: array([array([ 1., 1., 1.]), array([-1., -1., -1.])], dtype=object)
In [8]: sign(b)
ValueError Traceback (most recent call last) <ipython-input-8-3b1a81271d2e> in <module>() ----> 1 sign(b)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
exactly -- it seems to me that a special case for numpy arrays as objects in object arrays makes sense, so you'd get:
In [6]: oa Out[6]: array([[1.0, 1.0, 1.0], [-1.0, -1.0, -1.0]], dtype=object)
In [7]: np.sign(oa) Out[7]: array([[1, 1, 1], [-1, -1, -1]], dtype=object)
(which you do now in the version I'm running).
Though rather than the special case, maybe we really need dtype=ndarray arrays?
oa = np.array([a1, a2], dtype=np.ndarray)
Then we could count on everything in the array being an array.....
-CHB
On Mi, 2015-09-30 at 09:11 -0700, Chris Barker wrote:
On Tue, Sep 29, 2015 at 6:35 PM, Charles R Harris charlesr.harris@gmail.com wrote: For this, and other use-cases, special casing Numpy arrays stored in object arrays does make sense:
"If this is s a Numpy array, pass the operation through." Because we now (development) use rich compare, the result looks like Oops, not what was intended. In fact it raises an error In [7]: b Out[7]: array([array([ 1., 1., 1.]), array([-1., -1., -1.])], dtype=object) In [8]: sign(b) --------------------------------------------------------------------------- ValueError Traceback (most recent call last) <ipython-input-8-3b1a81271d2e> in <module>() ----> 1 sign(b) ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
exactly -- it seems to me that a special case for numpy arrays as objects in object arrays makes sense, so you'd get:
In [6]: oa Out[6]: array([[1.0, 1.0, 1.0], [-1.0, -1.0, -1.0]], dtype=object)
In [7]: np.sign(oa) Out[7]: array([[1, 1, 1], [-1, -1, -1]], dtype=object)
(which you do now in the version I'm running).
Though rather than the special case, maybe we really need dtype=ndarray arrays?
I think this (as a dtype) is an obvious solution. The other solution, I am not sure about in general to be honest. We may have to be more careful about creating a monster with new dtypes, rather than being careful to implement all possible features ;). It is not that I think we would not have consistent rules, etc. it is just that we *want* to force code to be obvious. If someone has arrays inside arrays, maybe he should be expected to specify that.
It actually breaks some logic (or cannot be implemented for everything), because we have signatures such as `O->?`, which does not support array output.
- Sebastian
oa = np.array([a1, a2], dtype=np.ndarray)
Then we could count on everything in the array being an array.....
-CHB
--
Christopher Barker, Ph.D. Oceanographer
Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception
Chris.Barker@noaa.gov _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Di, 2015-09-29 at 11:16 -0700, Nathaniel Smith wrote:
On Sep 29, 2015 8:25 AM, "Anne Archibald" archibald@astron.nl wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point
arrays. Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects don't have a sign, and the natural way to implement something like np.sign's semantics using only the "object" interface is
if obj < 0: return -1 elif obj > 0: return 1 elif obj == 0: return 0 else: raise
In general I'm not a big fan of trying to do all kinds of guessing about how to handle random objects in object arrays, the kind that ends up with a big chain of type checks and fallback behaviors. Pretty soon we find ourselves trying to extend the language with our own generic dispatch system for arbitrary python types, just for object arrays. (The current hack where for object arrays np.log will try calling obj.log() is particularly horrible. There is no rule in python that "log" is a reserved method name for "logarithm" on arbitrary objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a PyFloatDtype kind of thing (it is a bit different from our float because it would always convert back to a python float and maybe raises more errors), which "registers" with the dtype system in that it says "I know how to handle python floats and store them in an array and provide ufunc implementations for it".
Then, the "object" dtype ufuncs would try to call the ufunc on each element, including "conversion". They would find a "float", since it is not an array-like container, they interpret it as a PyFloatDtype scalar and call the scalars ufunc (the PyFloatDtype scalar would be a python float).
Of course likely I am thinking down the wrong road, but if you want e.g. an array of Decimals, you need some way to tell that numpy as a PyDecimalDtype. Now "object" would possibly be just a fallback to mean "figure out what to use for each element". It would be a bit slower, but it would work very generally, because numpy would not impose limits as such.
- Sebastian
OTOH sometimes practicality beats purity, and at least object arrays are already kinda cordoned off from the rest of the system, so I don't feel as strongly as if we were talking about core functionality.
...is there a compelling reason to even support np.sign on object arrays? This seems pretty far into the weeds, and that tends to lead to poor intuition and decision making.
-n
NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion
On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg sebastian@sipsolutions.net wrote:
On Di, 2015-09-29 at 11:16 -0700, Nathaniel Smith wrote:
[...]
In general I'm not a big fan of trying to do all kinds of guessing about how to handle random objects in object arrays, the kind that ends up with a big chain of type checks and fallback behaviors. Pretty soon we find ourselves trying to extend the language with our own generic dispatch system for arbitrary python types, just for object arrays. (The current hack where for object arrays np.log will try calling obj.log() is particularly horrible. There is no rule in python that "log" is a reserved method name for "logarithm" on arbitrary objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a PyFloatDtype kind of thing (it is a bit different from our float because it would always convert back to a python float and maybe raises more errors), which "registers" with the dtype system in that it says "I know how to handle python floats and store them in an array and provide ufunc implementations for it".
Then, the "object" dtype ufuncs would try to call the ufunc on each element, including "conversion". They would find a "float", since it is not an array-like container, they interpret it as a PyFloatDtype scalar and call the scalars ufunc (the PyFloatDtype scalar would be a python float).
I'm not sure I understand this, but it did make me think of one possible approach --
in my notebook sketches for what the New and Improved ufunc API might look like, I was already pondering whether the inner loop should receive a pointer to the ufunc object itself. Not for any reason in particular, but just because hey they're sorta vaguely like methods and methods get pointers to the object. But now I know what this is useful for :-).
If ufunc loops get a pointer to the ufunc object itself, then we can define a single inner loop function that looks like (sorta-Cython code):
cdef generic_object_inner_loop(ufunc, args, strides, n, ...): for i in range(n): arg_objs = [] for i in range(ufunc.narg): args_objs.append(<object> (args[j] + strides[j] * i)) ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])
and register it by default in every ufunc with signature "{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in just a few lines of code provide a pretty sensible generic behavior for *all* object array ufuncs -- they recursively call the ufunc on their contents.
As a prerequisite of course we would need to remove the auto-coercion of unknown objects to object arrays, otherwise this becomes an infinite recursion. But we already decided to do that.
And for this to be really useful for arbitrary objects, not just the ones that asarray recognizes, then we need __numpy_ufunc__. But again, we already decided to do that :-).
-n
On Mi, 2015-09-30 at 00:01 -0700, Nathaniel Smith wrote:
On Tue, Sep 29, 2015 at 2:07 PM, Sebastian Berg sebastian@sipsolutions.net wrote:
On Di, 2015-09-29 at 11:16 -0700, Nathaniel Smith wrote:
[...]
In general I'm not a big fan of trying to do all kinds of guessing about how to handle random objects in object arrays, the kind that ends up with a big chain of type checks and fallback behaviors. Pretty soon we find ourselves trying to extend the language with our own generic dispatch system for arbitrary python types, just for object arrays. (The current hack where for object arrays np.log will try calling obj.log() is particularly horrible. There is no rule in python that "log" is a reserved method name for "logarithm" on arbitrary objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.
I have the little dream here that what could happen is that we create a PyFloatDtype kind of thing (it is a bit different from our float because it would always convert back to a python float and maybe raises more errors), which "registers" with the dtype system in that it says "I know how to handle python floats and store them in an array and provide ufunc implementations for it".
Then, the "object" dtype ufuncs would try to call the ufunc on each element, including "conversion". They would find a "float", since it is not an array-like container, they interpret it as a PyFloatDtype scalar and call the scalars ufunc (the PyFloatDtype scalar would be a python float).
I'm not sure I understand this, but it did make me think of one possible approach --
in my notebook sketches for what the New and Improved ufunc API might look like, I was already pondering whether the inner loop should receive a pointer to the ufunc object itself. Not for any reason in particular, but just because hey they're sorta vaguely like methods and methods get pointers to the object. But now I know what this is useful for :-).
If ufunc loops get a pointer to the ufunc object itself, then we can define a single inner loop function that looks like (sorta-Cython code):
cdef generic_object_inner_loop(ufunc, args, strides, n, ...): for i in range(n): arg_objs = [] for i in range(ufunc.narg): args_objs.append(<object> (args[j] + strides[j] * i)) ufunc(*arg_objs[:ufunc.nin], out=arg_objs[ufunc.nin:])
and register it by default in every ufunc with signature "{}->{}".format("O" * ufunc.nin, "O" * ufunc.nout). And this would in just a few lines of code provide a pretty sensible generic behavior for *all* object array ufuncs -- they recursively call the ufunc on their contents.
As a prerequisite of course we would need to remove the auto-coercion of unknown objects to object arrays, otherwise this becomes an infinite recursion. But we already decided to do that.
And for this to be really useful for arbitrary objects, not just the ones that asarray recognizes, then we need __numpy_ufunc__. But again, we already decided to do that :-).
Well, what I mean is. A `Decimal` will probably never know about numpy itself. So I was wondering if you should teach numpy the other way around about it. I.e. you would create an object which has all the information about ufuncs and casting for Decimal and register it with numpy. Then when numpy sees a Decimal (also in `asarray` it would know what to do with them, how to store them in an array, etc. The `Decimal` object would be the scalar version of an array of Decimals. By the way, in some way an array is a "Scalar" as well, it can be put into another array and if you apply the ufunc to it, it applies the ufunc to all its elements.
This all is likely too complicated though, maybe it is better to just force the user to subclass the Decimal to achieve this. I am sure there are quite a few roads we could go and we just need to think about it some more about what we want and what we can do. :)
- Sebastian
-n
On Wed, Sep 30, 2015 at 12:32 AM, Sebastian Berg <sebastian@sipsolutions.net
wrote:
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.Well,
what I mean is. A
`Decimal` will probably never know about numpy itself. So I was wondering if you should teach numpy the other way around about it.
indeed -- but the way to do that is to create a Decimal dtype -- if we have the "better dtype support", then that shouldn't be hard to do.
-CHB
On 09/29/2015 02:16 PM, Nathaniel Smith wrote:
On Sep 29, 2015 8:25 AM, "Anne Archibald" <archibald@astron.nl mailto:archibald@astron.nl> wrote:
IEEE 754 has signum(NaN)->NaN. So does np.sign on floating-point
arrays. Why should it be different for object arrays?
The argument for doing it this way would be that arbitrary python objects don't have a sign, and the natural way to implement something like np.sign's semantics using only the "object" interface is
if obj < 0: return -1 elif obj > 0: return 1 elif obj == 0: return 0 else: raise
In general I'm not a big fan of trying to do all kinds of guessing about how to handle random objects in object arrays, the kind that ends up with a big chain of type checks and fallback behaviors. Pretty soon we find ourselves trying to extend the language with our own generic dispatch system for arbitrary python types, just for object arrays. (The current hack where for object arrays np.log will try calling obj.log() is particularly horrible. There is no rule in python that "log" is a reserved method name for "logarithm" on arbitrary objects. Ditto for the other ufuncs that implement this hack.)
Plus we hope that many use cases for object arrays will soon be supplanted by better dtype support, so now may not be the best time to invest heavily in making object arrays complicated and powerful.
Even though I submitted the PR to make object arrays more powerful, this makes a lot of sense to me.
Let's say we finish a new dtype system, in which (I imagine) each dtype specifies how to calculate each ufunc elementwise for that type. What are the remaining use cases for generic object arrays? The only one I see is having an array with elements of different types, which seems like a dubious idea anyway. (Nested ndarrays of varying length could be implemented as a dtype, as could the PyFloatDtype Sebastian mentioned, without need for a generic 'object' dtype which has to figure out how to call ufuncs on individual objects of different type).
Allan
On Tue, 29 Sep 2015 09:13:15 -0600 Charles R Harris charlesr.harris@gmail.com wrote:
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
None for example? float('nan') may be a bit weird amongst e.g. an array of Decimals.
Regards
Antoine.
On Tue, Sep 29, 2015 at 11:14 AM, Antoine Pitrou solipsis@pitrou.net wrote:
None for example? float('nan') may be a bit weird amongst e.g. an array of Decimals
The downside to `None` is that it's one more thing to check for and makes object arrays an even weirder edge case. (Incidentally, Decimal does have its own non-float NaN which throws a whole different wrench in the works. ` np.sign(Decimal('NaN'))` is going to raise an error no matter what.)
A float (or numpy) NaN makes more sense to return for mixed datatypes than None does, in my opinion. At least then one can use `isfinite`, etc to check while `np.isfinite(None)` will raise an error. Furthermore, if there's at least one floating point NaN in the object array, getting a float NaN out makes sense.
Just my $0.02, anyway.
On Tue, Sep 29, 2015 at 8:13 AM, Charles R Harris <charlesr.harris@gmail.com
wrote:
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
We discussed this last month on the list and on GitHub: https://mail.scipy.org/pipermail/numpy-discussion/2015-August/073503.html https://github.com/numpy/numpy/issues/6265 https://github.com/numpy/numpy/pull/6269/files
The discussion was focused on what to do in the generic fallback case. Now that I think about this more, I think it makes sense to explicitly check for NaN in the unorderable case, and return NaN is the input is NaN. I would not return NaN in general from unorderable objects, though -- in general we should raise an error.
It sounds like Allan has already fixed this in his PR, but it also would not be hard to add that logic to the existing code. Is this code in the NumPy 1.10?
Stephan
On Tue, Sep 29, 2015 at 11:59 AM, Stephan Hoyer shoyer@gmail.com wrote:
On Tue, Sep 29, 2015 at 8:13 AM, Charles R Harris < charlesr.harris@gmail.com> wrote:
Due to a recent commit, Numpy master now raises an error when applying the sign function to an object array containing NaN. Other options may be preferable, returning NaN for instance, so I would like to open the topic for discussion on the list.
We discussed this last month on the list and on GitHub: https://mail.scipy.org/pipermail/numpy-discussion/2015-August/073503.html https://github.com/numpy/numpy/issues/6265 https://github.com/numpy/numpy/pull/6269/files
The discussion was focused on what to do in the generic fallback case. Now that I think about this more, I think it makes sense to explicitly check for NaN in the unorderable case, and return NaN is the input is NaN. I would not return NaN in general from unorderable objects, though -- in general we should raise an error.
It sounds like Allan has already fixed this in his PR, but it also would not be hard to add that logic to the existing code. Is this code in the NumPy 1.10?
No. NumPy 1.10 also has differing behavior between python 2 and python 3. The reason I raise the question now is that current master has replace use of PyObject_Compare by PyObject_RichCompare for both python 2 and 3. I would be easy to extend it. I'm less sure of Allan's work, on a quick look it seems more complicated.
charris@fc [~]$ python3 Python 3.4.2 (default, Jul 9 2015, 17:24:30) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux Type "help", "copyright", "credits" or "license" for more information.
import numpy as np np.sign(np.array([float('nan')]*3, np.object))
array([None, None, None], dtype=object)
charris@fc [~]$ python2 Python 2.7.10 (default, Jul 5 2015, 14:15:43) [GCC 5.1.1 20150618 (Red Hat 5.1.1-4)] on linux2 Type "help", "copyright", "credits" or "license" for more information.
import numpy as np np.sign(np.array([float('nan')]*3, np.object))
array([-1, -1, -1], dtype=object)
Chuck