unittest of sequence equality
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array. import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.])) I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way. In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__. Is that wrong? Thank you, Alan Isaac
On Tue, Dec 22, 2020 at 10:54 AM Alan G. Isaac
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Yes and no. :) I don't agree that `seq1 == seq2` should not be tried if the sequences support it, but the function does work on sequences that lack a definition of `__eq__` as you would expect (e.g. user-defined sequences where you just didn't want to bother). The fact that numpy chooses to implement __eq__ in such a way that its result would be surprising if used in an `if` guard I think is more a design choice/issue of numpy than a suggestion that you can't trust `==` in testing because it _can_ be something other than True/False.
On 22/12/2020 19:08, Brett Cannon wrote:
... The fact that numpy chooses to implement __eq__ in such a way that its result would be surprising if used in an `if` guard I think is more a design choice/issue of numpy than a suggestion that you can't trust `==` in testing because it _can_ be something other than True/False.
+1 In addition to NumPy's regularly surprising interpretation of
operators, it is evident from Ivan Pozdeev's investigation (other
branch) that part of the problem lies with bool(np.array) being an
error. I can see why that might be sensible. You can have one or the
other, but not both.
I wondered if Python had become stricter here after NumPy made its
choices, but a little mining turns up:
"New in version 2.1. These are the so-called ``rich comparison''
methods, and are called for comparison operators in preference to
__cmp__() below. The correspondence between operator symbols and
method names is as follows: |x
Interesting. Did you look at the code? It is here (that's the `==` operator
you're complaining about):
https://github.com/python/cpython/blob/6afb730e2a8bf0b472b4c3157bcf5b44aa7e6...
The code does already analyze the length of the sequence
You are right that collections.abc.Sequence (or its ancestors other than
object) does not implement `__eq__`, so it would seem that the `==`
operator would have to be replaced with a simple loop:
```
for x, y in zip(seq1, seq2):
if x is not y and x != y:
break
else:
return # They are all equal
```
Making that change would probably slow things down. (Note that the odd
check "x is not y and x != y" is needed to keep the previous behavior
regarding NaN and other objects that aren't equal to themselves.)
One could also argue that the docstring warns about this issue:
```
For the purposes of this function, a valid ordered sequence type is one
which can be indexed, has a length, and has an equality operator.
```
IOW, I think this ship has actually sailed.
On Tue, Dec 22, 2020 at 10:56 AM Alan G. Isaac
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido) *Pronouns: he/him **(why is my pronoun here?)* http://feministing.com/2015/02/03/how-using-they-as-a-singular-pronoun-can-c...
On 22.12.2020 21:52, Alan G. Isaac wrote:
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
You sure about that? For me, bool(np.array) raises an exception: In [12]: np.__version__ Out[12]: '1.19.4' In [11]: if [False, False]==np.array([False, False]): print("foo") <...> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
Here, `seq1 == seq2` produces a boolean array (i.e., an array of boolean values). hth, Alan Isaac On 12/22/2020 2:28 PM, Ivan Pozdeev via Python-Dev wrote:
You sure about that? For me, bool(np.array) raises an exception:
In [12]: np.__version__ Out[12]: '1.19.4'
In [11]: if [False, False]==np.array([False, False]): print("foo") <...> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
On 22.12.2020 21:52, Alan G. Isaac wrote:
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct: http://python.org/psf/codeofconduct/
Okay, I see that by "fails", you probably meant "raises this exception" rather than fails the usual way (i.e. raises anAssertionError). On 22.12.2020 22:38, Alan G. Isaac wrote:
Here, `seq1 == seq2` produces a boolean array (i.e., an array of boolean values). hth, Alan Isaac
On 12/22/2020 2:28 PM, Ivan Pozdeev via Python-Dev wrote:
You sure about that? For me, bool(np.array) raises an exception:
In [12]: np.__version__ Out[12]: '1.19.4'
In [11]: if [False, False]==np.array([False, False]): print("foo") <...> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
On 22.12.2020 21:52, Alan G. Isaac wrote:
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/74CUML37... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
In the light of this and https://github.com/python/cpython/blob/6afb730e2a8bf0b472b4c3157bcf5b44aa7e6... (linked to from https://mail.python.org/archives/list/python-dev@python.org/message/AQRLRVY7... ) I reckon that *like the other code before it, `seq1 == seq2` should check for (TypeError, NotImplementedError) and fall back to by-element comparison in such a case.* On 22.12.2020 22:50, Ivan Pozdeev via Python-Dev wrote:
Okay, I see that by "fails", you probably meant "raises this exception" rather than fails the usual way (i.e. raises anAssertionError).
On 22.12.2020 22:38, Alan G. Isaac wrote:
Here, `seq1 == seq2` produces a boolean array (i.e., an array of boolean values). hth, Alan Isaac
On 12/22/2020 2:28 PM, Ivan Pozdeev via Python-Dev wrote:
You sure about that? For me, bool(np.array) raises an exception:
In [12]: np.__version__ Out[12]: '1.19.4'
In [11]: if [False, False]==np.array([False, False]): print("foo") <...> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
On 22.12.2020 21:52, Alan G. Isaac wrote:
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/74CUML37... Code of Conduct: http://python.org/psf/codeofconduct/ -- Regards, Ivan
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/MKNN64A4... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
On 22.12.2020 22:59, Ivan Pozdeev via Python-Dev wrote:
In the light of this and https://github.com/python/cpython/blob/6afb730e2a8bf0b472b4c3157bcf5b44aa7e6... (linked to from https://mail.python.org/archives/list/python-dev@python.org/message/AQRLRVY7... )
I reckon that
*like the other code before it, `seq1 == seq2` should check for (TypeError, NotImplementedError)
and fall back to by-element comparison in such a case.*
Or just bail out ("resist the temptation to guess") and tell the user to compare their weird types themselves.
On 22.12.2020 22:50, Ivan Pozdeev via Python-Dev wrote:
Okay, I see that by "fails", you probably meant "raises this exception" rather than fails the usual way (i.e. raises anAssertionError).
On 22.12.2020 22:38, Alan G. Isaac wrote:
Here, `seq1 == seq2` produces a boolean array (i.e., an array of boolean values). hth, Alan Isaac
On 12/22/2020 2:28 PM, Ivan Pozdeev via Python-Dev wrote:
You sure about that? For me, bool(np.array) raises an exception:
In [12]: np.__version__ Out[12]: '1.19.4'
In [11]: if [False, False]==np.array([False, False]): print("foo") <...> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
On 22.12.2020 21:52, Alan G. Isaac wrote:
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any particular way.
In short: a test named `assertSequenceEqual` should, I would think, work for any sequence and therefore (based on the available documentation) should not depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/74CUML37... Code of Conduct: http://python.org/psf/codeofconduct/ -- Regards, Ivan
_______________________________________________ Python-Dev mailing list --python-dev@python.org To unsubscribe send an email topython-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived athttps://mail.python.org/archives/list/python-dev@python.org/message/MKNN64A4... Code of Conduct:http://python.org/psf/codeofconduct/
-- Regards, Ivan
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/WYHFEGPV... Code of Conduct: http://python.org/psf/codeofconduct/
-- Regards, Ivan
This comment completely misses the point. This "weird type" qualifies as a Sequence. (See collections.abc.) Alan Isaac On 12/22/2020 3:09 PM, Ivan Pozdeev via Python-Dev wrote:
Or just bail out ("resist the temptation to guess") and tell the user to compare their weird types themselves.
On Tue, Dec 22, 2020 at 06:33:41PM -0500, Alan G. Isaac wrote:
This comment completely misses the point. This "weird type" qualifies as a Sequence. (See collections.abc.)
It's not weird because of the sequence abc, it's weird because of its treatment of equality, using the `==` operator as an element-wise operator instead of an object equality boolean operator. Numpy is entitled to do this, but we're not obligated to take heroic measures to integrate numpy arrays with unittest methods. If we can do so easily, sure, let's fix it. I think Ivan's suggestion that the assertSequenceEqual method fall back on element-by-element comparisons has some merit. -- Steve
On Wed, Dec 23, 2020 at 1:06 AM Steven D'Aprano
We're not obligated to take heroic measures to integrate numpy arrays with unittest methods. If we can do so easily, sure, let's fix it.
I think Ivan's suggestion that the assertSequenceEqual method fall back on element-by-element comparisons has some merit.
If there are other common types this helps with, sure. But for numpy, as pointed out elsewhere in this thread, it would still fail for numpy arrays of > 1 dimension. Personally I think this is really an issue with the structure of unitest -- having a custom assertion for every possibility is intractable. If you want to test numpy arrays, use the utilities provided by numpy. - CHB -- Christopher Barker, Ph.D. Oceanographer Emergency Response Division NOAA/NOS/OR&R (206) 526-6959 voice 7600 Sand Point Way NE (206) 526-6329 fax Seattle, WA 98115 (206) 526-6317 main reception Chris.Barker@noaa.gov
On 1/8/2021 2:50 PM, Chris Barker via Python-Dev wrote:
If there are other common types this helps with, sure. But for numpy, as pointed out elsewhere in this thread, it would still fail for numpy arrays of > 1 dimension.
Personally I think this is really an issue with the structure of unitest -- having a custom assertion for every possibility is intractable.
If you want to test numpy arrays, use the utilities provided by numpy.
This comment misses the key point, which is: `assertSequenceEqual` should not rely on behavior that is not ensured for typing.Sequence, but it currently does. The failure on a numpy array simply exposes this problem. The array-dimension consideration is also a red herring. For example, `unittest.TestCase().assertSequenceEqual([1,2,3],(1,2,3))` pasess but `unittest.TestCase().assertSequenceEqual([[1,2,3]],[(1,2,3)])` raises. This behavior remains unchallenged. Alan Isaac
On Sat, Jan 09, 2021 at 07:56:24AM -0500, Alan G. Isaac wrote:
This comment misses the key point, which is: `assertSequenceEqual` should not rely on behavior that is not ensured for typing.Sequence, but it currently does. The failure on a numpy array simply exposes this problem.
You are making that as a definitive statement of fact, but it's not clear to me that this is actually true. There are at least two problems with your position: (1) The Sequence ABC requires only the *presence* of certain methods, not their semantics. We're entitled to assume the obvious, implicit, sequence-like semantics. If a class implements the methods, but provides unexpected semantics, anything could happen. (2) Equality is a fundament operation that we are entitled to assume that *all* objects support. See above: we're entitled to assume the standard semantics for equality too. Objects which have unusual semantics for equality, such as float NANs, may behave in unexpected ways. So I don't think that we are *required* to support unusual sequences like numpy. On the other hand, I think that we can extend assertSequenceEqual to support numpy arrays quite easily. A quick glance at the source code: https://github.com/python/cpython/blob/3.9/Lib/unittest/case.py suggests that all we need do is catch a potential ValueError around the sequence equality test, and fall back on the element by element processing: try: if seq1 == seq1: return except ValueError: # Possibly a numpy array? pass I don't think that this is a breaking change, and I think it should do what you expect. I don't believe that we need to accept your reasoning regarding the Sequence ABC to accept this enhancement. One need only accept that although numpy's array equality semantics are non-standard and unhelpful, numpy is an important third-party library, and the cost of supporting sequences like numpy arrays is negligible. -- Steve
Numpy chose to violate the principal of equality by having __eq__ not
return a bool. So a numpy type can't be used reliably outside of the numpy
DSL.
-gps
On Tue, Dec 22, 2020, 11:51 AM Alan G. Isaac
Here, `seq1 == seq2` produces a boolean array (i.e., an array of boolean values). hth, Alan Isaac
On 12/22/2020 2:28 PM, Ivan Pozdeev via Python-Dev wrote:
You sure about that? For me, bool(np.array) raises an exception:
In [12]: np.__version__ Out[12]: '1.19.4'
In [11]: if [False, False]==np.array([False, False]): print("foo") <...>
ValueError: The truth value of an array with more than one element is ambiguous.
Use a.any() or a.all()
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array. Specifically, I see no requirement that a sequence implement __eq__ at all much less in any
On 22.12.2020 21:52, Alan G. Isaac wrote: particular way.
In short: a test named `assertSequenceEqual` should, I would think,
work for any sequence and therefore (based on the available documentation) should not
depend on the class-specific implementation of __eq__.
Is that wrong?
Thank you, Alan Isaac _______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at
https://mail.python.org/archives/list/python-dev@python.org/message/6Z43SU2R... Code of Conduct:
_______________________________________________ Python-Dev mailing list -- python-dev@python.org To unsubscribe send an email to python-dev-leave@python.org https://mail.python.org/mailman3/lists/python-dev.python.org/ Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/74CUML37... Code of Conduct: http://python.org/psf/codeofconduct/
On Tue, Dec 22, 2020 at 6:57 PM Alan G. Isaac
The following test fails because because `seq1 == seq2` returns a (boolean) NumPy array whenever either seq is a NumPy array.
import unittest import numpy as np unittest.TestCase().assertSequenceEqual([1.,2.,3.], np.array([1.,2.,3.]))
I expected `unittest` to rely only on features of a `collections.abc.Sequence`, which based on https://docs.python.org/3/glossary.html#term-sequence, I believe are satisfied by a NumPy array.
If you know you might be dealing with NumPy arrays (as the import suggests), I think it's simply right to spell it as: unittest.TestCase().assertTrue(np.array_equal([1., 2., 3.], np.array([1., 2., 3.]))) Or for pytest etc., simply: assert np.array_equal([1., 2., 3.], np.array([1., 2., 3.]))
On Tue, 22 Dec 2020 19:32:15 +0000
David Mertz
If you know you might be dealing with NumPy arrays (as the import suggests), I think it's simply right to spell it as:
unittest.TestCase().assertTrue(np.array_equal([1., 2., 3.], np.array([1., 2., 3.])))
Please don't suggest this, it will produce unhelpful error messages (do you like "False is not true" errors in CI builds?). The better solution is to use the dedicated assertions in the `numpy.testing` package: https://numpy.org/doc/stable/reference/routines.testing.html Regards Antoine.
participants (10)
-
Alan G. Isaac
-
Antoine Pitrou
-
Brett Cannon
-
Chris Barker
-
David Mertz
-
Gregory P. Smith
-
Guido van Rossum
-
Ivan Pozdeev
-
Jeff Allen
-
Steven D'Aprano