str == int puzzlement

Hi, Please forgive me if this is obvious, but this surprised me: In [15]: x = np.array(['a', 'b']) In [16]: x == 'a' # this was what I expected Out[16]: array([ True, False], dtype=bool) In [17]: x == 1 # this was strange to me Out[17]: False Is it easy to explain why this is? Thanks a lot, Matthew

I think this is just Python behavior; comparing python ints and strs also gives False: In [45]: 8 == 'L' Out[45]: False On Wed, Jul 28, 2010 at 6:42 PM, Matthew Brett <matthew.brett@gmail.com>wrote:
Hi,
Please forgive me if this is obvious, but this surprised me:
In [15]: x = np.array(['a', 'b'])
In [16]: x == 'a' # this was what I expected Out[16]: array([ True, False], dtype=bool)
In [17]: x == 1 # this was strange to me Out[17]: False
Is it easy to explain why this is?
Thanks a lot,
Matthew _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion

Hi, On Wed, Jul 28, 2010 at 6:49 PM, John Salvatier <jsalvati@u.washington.edu> wrote:
I think this is just Python behavior; comparing python ints and strs also gives False:
In [45]: 8 == 'L' Out[45]: False
Just to be clear, from:
a = np.array(['a','b']) a == 1
I was expecting: array([ False, False], dtype=bool) For:
In [22]: a = np.array(['a','b'])
In [23]: a + 'c'
etc - it makes sense to me that I can't add to numpy strings. Best, Matthew

In [15]: x = np.array(['a', 'b'])
In [16]: x == 'a' # this was what I expected Out[16]: array([ True, False], dtype=bool)
In [17]: x == 1 # this was strange to me Out[17]: False
Is it easy to explain why this is?
I'll call this a bug in NumPy's broadcasting. "x == 1" should have returned: array([ False, False], dtype=bool) Sturla

I'll call this a bug in NumPy's broadcasting. "x == 1" should have returned:
This is probably related: In [22]: a = np.array(['a','b']) In [23]: a + 'c' --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Windows\system32\<string> in <module>() TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str' In [24]: a + 1 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Windows\system32\<string> in <module>() TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'int' Operator + not supported for types numpy.ndarray and int??? I think a ticket on the bug tracker is due. Sturla

I'll call this a bug in NumPy's broadcasting. "x == 1" should have returned:
This is probably related:
In [22]: a = np.array(['a','b'])
In [23]: a + 'c' --------------------------------------------------------------------------- TypeError Traceback (most recent call last)
C:\Windows\system32\<string> in <module>()
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'str'
In [24]: a + 1 --------------------------------------------------------------------------- TypeError Traceback (most recent call last)
C:\Windows\system32\<string> in <module>()
TypeError: unsupported operand type(s) for +: 'numpy.ndarray' and 'int'
And setting the dtype to object, we get the correct behavior: In [30]: a = np.array(['a','b'],dtype=object) In [31]: a == 1 Out[31]: array([False, False], dtype=bool) In [32]: a + 'c' Out[32]: array([ac, bc], dtype=object) In [33]: a + 1 --------------------------------------------------------------------------- TypeError Traceback (most recent call last) C:\Windows\system32\<string> in <module>() TypeError: cannot concatenate 'str' and 'int' objects The bug also seems to affect arrays with unicode strings, btw. Sturla

On Wed, Jul 28, 2010 at 6:42 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Please forgive me if this is obvious, but this surprised me:
In [15]: x = np.array(['a', 'b'])
In [16]: x == 'a' # this was what I expected Out[16]: array([ True, False], dtype=bool)
In [17]: x == 1 # this was strange to me Out[17]: False
Here's a related case:
np.array(['a', 'b']) == np.array([1, 2]) False

2010/7/29 Keith Goodman <kwgoodman@gmail.com>:
On Wed, Jul 28, 2010 at 6:42 PM, Matthew Brett <matthew.brett@gmail.com> wrote:
Hi,
Please forgive me if this is obvious, but this surprised me:
In [15]: x = np.array(['a', 'b'])
In [16]: x == 'a' # this was what I expected Out[16]: array([ True, False], dtype=bool)
In [17]: x == 1 # this was strange to me Out[17]: False
Here's a related case:
np.array(['a', 'b']) == np.array([1, 2]) False
Yeah, it's just that numpy knows that it cannot compare pears with apples:
a = numpy.asarray(['a', 'b']) a.__eq__(1) NotImplemented
so Python falls back to the Python string's or int's __eq__, which does not know of the structure of numpy, and returns simply False. In case of the numpy.object array, it's of course forwarded to the constituents, resulting in the "correct wrong" result (correct because it's Python-correct, wrong because it's comparing strings to ints). About Keith's case:
b = numpy.asarray([1, 2]) a.__eq__(b) NotImplemented b.__eq__(a) NotImplemented
So Python falls back to comparing the IDs id(a) == id(b), which also results in False. Maybe it would be better to raise a ValueError, which is not caught by the evaluation mechanism, to prevent such stuff. Friedrich

Hi,
Yeah, it's just that numpy knows that it cannot compare pears with apples:
a = numpy.asarray(['a', 'b']) a.__eq__(1) NotImplemented
Thank you - that's very helpful and clear.
Maybe it would be better to raise a ValueError, which is not caught by the evaluation mechanism, to prevent such stuff.
Sorry that this is not yet clear to me, but, is it true then that: The only situation where array.__eq__ sensibly falls back to python __eq__ is for the individual elements of object arrays? Thanks again, Matthew

2010/8/1 Matthew Brett <matthew.brett@gmail.com>:
Maybe it would be better to raise a ValueError, which is not caught by the evaluation mechanism, to prevent such stuff.
Sorry that this is not yet clear to me, but, is it true then that:
The only situation where array.__eq__ sensibly falls back to python __eq__ is for the individual elements of object arrays?
Btw, I changed my opinion on that. The rationale was to inform the user about that numpy cannot do that. This is based on that Python cannot do better. But right this assumtion isn't true, since maybe the other operand knows quite well how to compare the pears to the apples, for whatever reason. So better stick to the documentation: "A rich comparison method may return the singleton NotImplemented if it does not implement the operation for a given pair of arguments." And since at the moment, the functionality is in fact "NotImplemented", this is the way to go at the moment. Of course, there is room for change, e.g., by breaking it down to the scalar level, if the r.h.s. is not a numpy.ndarray, and returning an array of Falses else. But even this is error-prone, because, the r.h.s. may be a descendant-class instance of numpy.ndarray, knowing better what to do, but being no longer asked. So I come to the conclusion, that in fact all "implementations" of the __eq__ for not-comparable dtypes aren't. They are just based on too many assumptions. They would be a replacement for a real implementation, for a real functionality. To come to your question, Matthew, I think I would tend to give the general recommendation to not mix dtypes which not really compare to each other. It's just error-prone. In fact, when you want to compare by Python's implementation of __eq__, which should in fact be the builtin cmp(), then object arrays are the way to go. But then you cannot use a single non-object ndarray. To summarise, numpy simply does not take part in __eq__ calculations where it does not know what to do. Python does the rest to ask the others about what they think about the operation. But the others should be ndarray-aware. Friedrich
participants (5)
-
Friedrich Romstedt
-
John Salvatier
-
Keith Goodman
-
Matthew Brett
-
Sturla Molden