Using more specific methods in Python unit tests
Many Python tests were written a very long time before the unittest, using simple asserts. Then, when they have been ported to the unittest, asserts were replaced with the assert_ method and then with assertTrue. The unittest has a number of other methods to check for and report failure, from assertEqual, to more specific assertIs, assertIn, assertIsInstance, etc, added in 2.7. New methods provide better reporting in case of failure. I wrote a large patch which modifies the tests to use more specific methods [1]. Because it is too large, it was divided into many smaller patches, and separate issues were opened for them. At the moment the major part of the original patch has already been committed. Many thanks to Ezio for making a review for the majority of the issues. Some changes have been made by other people in unrelated issues. Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts? [1] http://bugs.python.org/issue16510 [2] http://bugs.python.org/issue20547
On Sat, Feb 15, 2014, at 10:12 AM, Serhiy Storchaka wrote:
Many Python tests were written a very long time before the unittest, using simple asserts. Then, when they have been ported to the unittest, asserts were replaced with the assert_ method and then with assertTrue. The unittest has a number of other methods to check for and report failure, from assertEqual, to more specific assertIs, assertIn, assertIsInstance, etc, added in 2.7. New methods provide better reporting in case of failure.
I wrote a large patch which modifies the tests to use more specific methods [1]. Because it is too large, it was divided into many smaller patches, and separate issues were opened for them. At the moment the major part of the original patch has already been committed. Many thanks to Ezio for making a review for the majority of the issues. Some changes have been made by other people in unrelated issues.
Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts?
I tend to agree with Raymond. I think such changes are very welcome when the module or tests are otherwise being changed, but on their on constitute unnecessary churn.
In article
<1392492250.26338.83831085.39A5ED08@webmail.messagingengine.com>,
Benjamin Peterson
Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts? I tend to agree with Raymond. I think such changes are very welcome when
On Sat, Feb 15, 2014, at 10:12 AM, Serhiy Storchaka wrote: the module or tests are otherwise being changed, but on their on constitute unnecessary churn.
+1 Integrity of the test suite and minimizing code churn top any benefits of more specific messages on failures. The expectation is that most tests will never fail so their changed messages will never be seen. For the rare cases when a test does fail, quite often the test was written in a way that will require examination of the code to understand exactly what the test case was intending to test and why it failed. Having a more specific exception message wouldn't help for many tests without further modifications; the key point is to know that the test failed. -- Ned Deily, nad@acm.org
On 16 February 2014 09:20, Ned Deily
In article <1392492250.26338.83831085.39A5ED08@webmail.messagingengine.com>, Benjamin Peterson
wrote: Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts? I tend to agree with Raymond. I think such changes are very welcome when
On Sat, Feb 15, 2014, at 10:12 AM, Serhiy Storchaka wrote: the module or tests are otherwise being changed, but on their on constitute unnecessary churn.
+1
Integrity of the test suite and minimizing code churn top any benefits of more specific messages on failures. The expectation is that most tests will never fail so their changed messages will never be seen. For the rare cases when a test does fail, quite often the test was written in a way that will require examination of the code to understand exactly what the test case was intending to test and why it failed. Having a more specific exception message wouldn't help for many tests without further modifications; the key point is to know that the test failed.
Right, there are a few key problems with large scale style changes to the test suite: 1. The worst case scenario is where we subtly change a test so that it is no longer testing what it is supposed to be testing, allowing the future introduction of an undetected regression. This isn't particularly *likely*, but a serious problem if it happens. 2. If there are pending patches for that module that include new tests, then the style change may cause the patches to no longer apply cleanly, require rework of bug fix and new feature patches to accommodate the style change. 3. Merges between branches may become more complicated (for reasons similar to 2), unless the style change is also applied to the maintenance branches (which is problematic due to 1). The practical benefits of this kind of change in the test suite are also highly dubious, because they *only help if the test fails at some point in the future*. At that point, whoever caused the test to fail will switch into debugging mode, and a couple of relevant points apply: * the cause of the problem may be immediately obvious anyway, since it's likely in whatever code they just changed * if they need more information, then they can refactor the failing test to use richer (and perhaps additional) assertions as part of the debugging exercise Now, debugging a failing test isn't a skill most programming courses teach, so it may be worth our while to actually provide some pointers on doing so effectively in the devguide (including a note about checking that the failing test is using rich assertions and potentially updating it if it isn't), but the general policy against large scale changes to take advantage of new features still applies to the test suite just as much as it does to the rest of the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia
On 16/02/2014 00:22, Nick Coghlan wrote:
In article <1392492250.26338.83831085.39A5ED08@webmail.messagingengine.com>, Benjamin Peterson
wrote: Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts? I tend to agree with Raymond. I think such changes are very welcome when
On Sat, Feb 15, 2014, at 10:12 AM, Serhiy Storchaka wrote: the module or tests are otherwise being changed, but on their on constitute unnecessary churn.
Right, there are a few key problems with large scale style changes to
On 16 February 2014 09:20, Ned Deily
wrote: the test suite: 1. The worst case scenario is where we subtly change a test so that it is no longer testing what it is supposed to be testing, allowing the future introduction of an undetected regression. This isn't particularly *likely*, but a serious problem if it happens.
2. If there are pending patches for that module that include new tests, then the style change may cause the patches to no longer apply cleanly, require rework of bug fix and new feature patches to accommodate the style change.
3. Merges between branches may become more complicated (for reasons similar to 2), unless the style change is also applied to the maintenance branches (which is problematic due to 1). I spend a *lot* of time working with the Python test suite on behalf of Jython, so I appreciate the care CPython puts into its testing. To a large extent, tests written for CPython drive Jython development: I suspect I work with a lot more failing tests than anyone here. Where we have a custom test, I often update them from in the latest CPython tests.
Often a test failure is not caused by code I just wrote, but by adding a CPython test or removing a "skip if Jython", and not having written anything yet. While the irrefutable "False is not true" always raises a smile, I'd welcome something more informative. It's a more than a "style" issue. What Nick says above is also not false, as general guidance, but taken as an iron rule seem to argue against concurrent development at all. Don't we manage this change pretty well already? I see little risk of problems 1-3 in the actual proposal, as the changes themselves are 99% of the "drop-in replacement" type: - self.assertTrue(isinstance(x, int)) + self.assertIsInstance(x, int) I found few places, on a quick scan, that risked changing the meaning: they introduce an if-statement, or refactor the expression -- I don't mean they're actually wrong. The point about breaking Serhiy's patch into independent parts will help manage with merging and this risk. The tests are not library code, but their other use is as an example of good practice in unit testing. I pretty much get my idea of Python's test facilities from this work. It was a while before I realised more expressive methods were available. Jeff Jeff Allen
Jeff Allen, 16.02.2014 11:23:
I spend a *lot* of time working with the Python test suite on behalf of Jython, so I appreciate the care CPython puts into its testing. To a large extent, tests written for CPython drive Jython development: I suspect I work with a lot more failing tests than anyone here.
Careful with such a bold statement. ;)
What Nick says above is also not false, as general guidance, but taken as an iron rule seem to argue against concurrent development at all. Don't we manage this change pretty well already? I see little risk of problems 1-3 in the actual proposal, as the changes themselves are 99% of the "drop-in replacement" type:
- self.assertTrue(isinstance(x, int)) + self.assertIsInstance(x, int)
While I generally second Nick's objections to this, I also agree that the kind of change above is such an obvious and straight forward improvement (since unittest doesn't have py.test's assert analysis) that I'm +1 on applying them. I've been annoyed more than once by a test failure in CPython's test suite (when compiled with Cython) that required me to look up and read the test source and rerun it locally in order to actually understand what was happening. Seeing a more informative error message right on our CI server would certainly help, if only to get a quicker idea if this failure is worth taking a closer look at. Stefan
On Sun, Feb 16, 2014 at 12:22 AM, Nick Coghlan
The practical benefits of this kind of change in the test suite are also highly dubious, because they *only help if the test fails at some point in the future*. At that point, whoever caused the test to fail will switch into debugging mode, and a couple of relevant points apply:
One place where those points don't apply so cleanly is when the test failure is coming from continuous integration and can't easily be reproduced locally (e.g., because there's a problem on a platform you don't have access to, or because it's some kind of threading-related intermittent failure that's exacerbated by the timing conditions on a particular machine). In those situations, an informative error message can easily save significant debugging time. Count me as +1 on the test updates, provided they're done carefully. (And all those I've looked at from Serhiy do indeed look careful.) -- Mark
On 2/15/2014 1:12 PM, Serhiy Storchaka wrote:
Many Python tests were written a very long time before the unittest, using simple asserts. Then, when they have been ported to the unittest, asserts were replaced with the assert_ method and then with assertTrue. The unittest has a number of other methods to check for and report failure, from assertEqual, to more specific assertIs, assertIn, assertIsInstance, etc, added in 2.7. New methods provide better reporting in case of failure.
Failure of assertTrue reports 'False is not true'.
I wrote a large patch which modifies the tests to use more specific methods [1]. Because it is too large, it was divided into many smaller patches, and separate issues were opened for them. At the moment the major part of the original patch has already been committed. Many thanks to Ezio for making a review for the majority of the issues. Some changes have been made by other people in unrelated issues.
Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts?
[1] http://bugs.python.org/issue16510 [2] http://bugs.python.org/issue20547
After thinking about Raymond's objections and checking http://docs.python.org/3/library/unittest.html#test-cases and noting Ezio's explicit approval and others tacit approval (by lack of objection), I think you should continue and finish. The reasons for not making making style changes in public stdlib modules are that we want people to import and use them, have promised stability absent notice otherwise, and because it is reasonably possible to make unintended semantic changes that might pass unnoticed by incomplete tests. The case is different on all counts for test.text_xyz modules. They are private application modules, not to be imported (except to run them), and subject to change without notice. Nearly all the changes proposed are from assertTrue to various specialized assertXs with documented equivalences to assertTrue. I presume you made the 1-to-1 replacements with a script using capturing regexes, with a check for false replacements such as might occasionally happen due to an operator being inside a string. If we are ever to make the replacements, doing so mechanically and in bulk is easier and safer than doing them one at a time by hand. This includes reviewing. -- Terry Jan Reedy
On Sat, 15 Feb 2014 20:12:33 +0200
Serhiy Storchaka
I wrote a large patch which modifies the tests to use more specific methods [1]. Because it is too large, it was divided into many smaller patches, and separate issues were opened for them. At the moment the major part of the original patch has already been committed. Many thanks to Ezio for making a review for the majority of the issues. Some changes have been made by other people in unrelated issues.
Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts?
When it comes specifically to test_bigmem, it is important for error messages to be informative, because the failures may be hard (if not enough RAM) or very long to diagnose on a developer's machine. So +1 to changing test_bigmem. As for the rest of the test suite, I find the "assertSpecific" form more readable that "assertTrue(... with some operator)". But I may be in a minority here :-) As for the "code churn" argument, I find that a much less important concern for the test suite than for the rest of the code. Regards Antoine.
participants (9)
-
Antoine Pitrou
-
Benjamin Peterson
-
Jeff Allen
-
Mark Dickinson
-
Ned Deily
-
Nick Coghlan
-
Serhiy Storchaka
-
Stefan Behnel
-
Terry Reedy