np.sign and object comparisons
![](https://secure.gravatar.com/avatar/dce2259ff9b547103d54acf1ea622314.jpg?s=120&d=mm&r=g)
There's been some work going on recently on Py2 vs Py3 object comparisons. If you want all the background, see gh-6265 <https://github.com/numpy/numpy/issues/6265> and follow the links there. There is a half baked PR in the works, gh-6269 <https://github.com/numpy/numpy/pull/6269>, that tries to unify behavior and fix some bugs along the way, by replacing all 2.x uses of PyObject_Compare with several calls to PyObject_RichCompareBool, which is available on 2.6, the oldest Python version we support. The poster child for this example is computing np.sign on an object array that has an np.nan entry. 2.x will just make up an answer for us:
cmp(np.nan, 0) -1
even though none of the relevant compares succeeds:
The current 3.x is buggy, so the fact that it produces the same made up result as in 2.x is accidental:
np.sign(np.array([np.nan], 'O')) array([-1], dtype=object)
Looking at the code, it seems that the original intention was for the answer to be `0`, which is equally made up but perhaps makes a little more sense. There are three ways of fixing this that I see: 1. Arbitrarily choose a value to set the return to. This is equivalent to choosing a default return for `cmp` for comparisons. This preserves behavior, but feels wrong. 2. Similarly to how np.sign of a floating point array with nans returns nan for those values, return e,g, None for these cases. This is my preferred option. 3. Raise an error, along the lines of the TypeError: unorderable types that 3.x produces for some comparisons. Thoughts anyone? Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On So, 2015-08-30 at 21:09 -0700, Jaime Fernández del Río wrote:
That would be my gut feeling as well. Returning `NaN` could also make sense, but I guess we run into problems since we do not know the input type. So `None` seems like the only option here I can think of right now. - Sebastian
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Aug 31, 2015 at 1:23 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
My inclination is that return NaN would be the appropriate choice. It's certainly consistent with the behavior for float dtypes -- my expectation for object dtype behavior is that it works exactly like applying the np.sign ufunc to each element of the array individually. On the other hand, I suppose there are other ways in which an object can fail all those comparisons (e.g., NaT?), so I suppose we could return None. But it would still be a weird outcome for the most common case. Ideally, I suppose, np.sign would return an array with int-NA dtype, but that's a whole different can of worms... Stephan
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, Aug 31, 2015 at 10:31 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think this is going through the np.sign timedelta64 loop, and thus is an unrelated issue? It does look like a bug though. -n -- Nathaniel J. Smith -- http://vorpus.org
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sun, Aug 30, 2015 at 9:09 PM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
any clear intuition or use cases, I guess I find option 3 somewhat tempting... it keeps our options open until someone who actually cares comes along with a use case to hone our intuition on, and is very safe in the mean time. (This was noticed in the course of routine code cleanups, right, not an external bug report? For all we know right now, no actual user has ever even tried to apply np.sign to an object array?) -n -- Nathaniel J. Smith -- http://vorpus.org
![](https://secure.gravatar.com/avatar/dce2259ff9b547103d54acf1ea622314.jpg?s=120&d=mm&r=g)
On Mon, Aug 31, 2015 at 11:49 PM, Nathaniel Smith <njs@pobox.com> wrote:
We do have a user that tried np.sign on an object array, and discovered that our Py3K object comparison was crap: https://github.com/numpy/numpy/issues/6229 No report of anyone trying np.sign on anything other than numbers that we know of, though. I'm starting to think that, given the lack of agreement, I thinking I am going to agree with you that raising an error may be the better option, because it's the least likely to break people's code if we later find we need to change it. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 08/31/2015 12:09 AM, Jaime Fernández del Río wrote:
I think np.sign on nan object arrays should raise the error AttributeError: 'float' object has no attribute 'sign' If I've understood correctly, currently object arrays work like this: If a ufunc has an equivalent pure-python func (eg, PyNumber_Add for np.add, PyNumber_Absolute for np.abs, < for np.greater_than) then numpy calls that for objects. Otherwise, if the object defines a method with the same name as the ufunc, numpy calls that method. For example, arccos is a ufunc that has no pure python equivalent, so you get the following behavior >>> a = np.array([-1], dtype='O') >>> np.abs(a) array([1], dtype=object) >>> np.arccos(a) AttributeError: 'int' object has no attribute 'arccos' >>> class MyClass: ... def arccos(self): ... return 1 >>> b = np.array([MyClass()], dtype='O') >>> np.arccos(b) array([1], dtype=object) Now, most comparison operators (eg, greater_than) are treated a little specially in loops.c. For some reason, sign is treated just like the other comparison operators, even through technically there is no pure-python equivalent to sign. I think that because there is no pure-python 'sign', numpy should attempt to call obj.sign, and in most cases this should fail with the error above. See also http://stackoverflow.com/questions/1986152/why-doesnt-python-have-a-sign-fun... I think the fix for sign is that the 'sign' ufunc in generate_umath.py should look more like the arccos one, and we should get rid of OBJECT_sign in loops.c. I'm not 100% sure about this since I haven't followed all of how generate_umath.py works yet. ------- By the way, based on some comments I saw somewhere (apologies, I forget who by!) I wrote up a vision for how ufuncs could work for objects, here: https://gist.github.com/ahaldane/c3f9bcf1f62d898be7c7 I'm a little unsure the ideas there are a good idea since they might be made obsolete by the big dtype subclassing improvements being discussed in the numpy roadmap thread. Allan
![](https://secure.gravatar.com/avatar/b4f6d4f8b501cb05fd054944a166a121.jpg?s=120&d=mm&r=g)
On So, 2015-08-30 at 21:09 -0700, Jaime Fernández del Río wrote:
That would be my gut feeling as well. Returning `NaN` could also make sense, but I guess we run into problems since we do not know the input type. So `None` seems like the only option here I can think of right now. - Sebastian
![](https://secure.gravatar.com/avatar/93a76a800ef6c5919baa8ba91120ee98.jpg?s=120&d=mm&r=g)
On Mon, Aug 31, 2015 at 1:23 AM, Sebastian Berg <sebastian@sipsolutions.net> wrote:
My inclination is that return NaN would be the appropriate choice. It's certainly consistent with the behavior for float dtypes -- my expectation for object dtype behavior is that it works exactly like applying the np.sign ufunc to each element of the array individually. On the other hand, I suppose there are other ways in which an object can fail all those comparisons (e.g., NaT?), so I suppose we could return None. But it would still be a weird outcome for the most common case. Ideally, I suppose, np.sign would return an array with int-NA dtype, but that's a whole different can of worms... Stephan
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Mon, Aug 31, 2015 at 10:31 AM, Antoine Pitrou <solipsis@pitrou.net> wrote:
I think this is going through the np.sign timedelta64 loop, and thus is an unrelated issue? It does look like a bug though. -n -- Nathaniel J. Smith -- http://vorpus.org
![](https://secure.gravatar.com/avatar/97c543aca1ac7bbcfb5279d0300c8330.jpg?s=120&d=mm&r=g)
On Sun, Aug 30, 2015 at 9:09 PM, Jaime Fernández del Río < jaime.frio@gmail.com> wrote:
any clear intuition or use cases, I guess I find option 3 somewhat tempting... it keeps our options open until someone who actually cares comes along with a use case to hone our intuition on, and is very safe in the mean time. (This was noticed in the course of routine code cleanups, right, not an external bug report? For all we know right now, no actual user has ever even tried to apply np.sign to an object array?) -n -- Nathaniel J. Smith -- http://vorpus.org
![](https://secure.gravatar.com/avatar/dce2259ff9b547103d54acf1ea622314.jpg?s=120&d=mm&r=g)
On Mon, Aug 31, 2015 at 11:49 PM, Nathaniel Smith <njs@pobox.com> wrote:
We do have a user that tried np.sign on an object array, and discovered that our Py3K object comparison was crap: https://github.com/numpy/numpy/issues/6229 No report of anyone trying np.sign on anything other than numbers that we know of, though. I'm starting to think that, given the lack of agreement, I thinking I am going to agree with you that raising an error may be the better option, because it's the least likely to break people's code if we later find we need to change it. Jaime -- (\__/) ( O.o) ( > <) Este es Conejo. Copia a Conejo en tu firma y ayúdale en sus planes de dominación mundial.
![](https://secure.gravatar.com/avatar/71832763447894e7c7f3f64bfd19c13f.jpg?s=120&d=mm&r=g)
On 08/31/2015 12:09 AM, Jaime Fernández del Río wrote:
I think np.sign on nan object arrays should raise the error AttributeError: 'float' object has no attribute 'sign' If I've understood correctly, currently object arrays work like this: If a ufunc has an equivalent pure-python func (eg, PyNumber_Add for np.add, PyNumber_Absolute for np.abs, < for np.greater_than) then numpy calls that for objects. Otherwise, if the object defines a method with the same name as the ufunc, numpy calls that method. For example, arccos is a ufunc that has no pure python equivalent, so you get the following behavior >>> a = np.array([-1], dtype='O') >>> np.abs(a) array([1], dtype=object) >>> np.arccos(a) AttributeError: 'int' object has no attribute 'arccos' >>> class MyClass: ... def arccos(self): ... return 1 >>> b = np.array([MyClass()], dtype='O') >>> np.arccos(b) array([1], dtype=object) Now, most comparison operators (eg, greater_than) are treated a little specially in loops.c. For some reason, sign is treated just like the other comparison operators, even through technically there is no pure-python equivalent to sign. I think that because there is no pure-python 'sign', numpy should attempt to call obj.sign, and in most cases this should fail with the error above. See also http://stackoverflow.com/questions/1986152/why-doesnt-python-have-a-sign-fun... I think the fix for sign is that the 'sign' ufunc in generate_umath.py should look more like the arccos one, and we should get rid of OBJECT_sign in loops.c. I'm not 100% sure about this since I haven't followed all of how generate_umath.py works yet. ------- By the way, based on some comments I saw somewhere (apologies, I forget who by!) I wrote up a vision for how ufuncs could work for objects, here: https://gist.github.com/ahaldane/c3f9bcf1f62d898be7c7 I'm a little unsure the ideas there are a good idea since they might be made obsolete by the big dtype subclassing improvements being discussed in the numpy roadmap thread. Allan
participants (6)
-
Allan Haldane
-
Antoine Pitrou
-
Jaime Fernández del Río
-
Nathaniel Smith
-
Sebastian Berg
-
Stephan Hoyer