Using more specific methods in Python unit tests

Many Python tests were written a very long time before the unittest, using simple asserts. Then, when they have been ported to the unittest, asserts were replaced with the assert_ method and then with assertTrue. The unittest has a number of other methods to check for and report failure, from assertEqual, to more specific assertIs, assertIn, assertIsInstance, etc, added in 2.7. New methods provide better reporting in case of failure. I wrote a large patch which modifies the tests to use more specific methods [1]. Because it is too large, it was divided into many smaller patches, and separate issues were opened for them. At the moment the major part of the original patch has already been committed. Many thanks to Ezio for making a review for the majority of the issues. Some changes have been made by other people in unrelated issues. Although Raymond approved a patch for test_bigmem [2], his expressed the insistent recommendation not to do this. So I stop committing new reviewed patches. Terry recommended to discuss this in Python-Dev. What are your thoughts? [1] http://bugs.python.org/issue16510 [2] http://bugs.python.org/issue20547

In article <1392492250.26338.83831085.39A5ED08@webmail.messagingengine.com>, Benjamin Peterson <benjamin@python.org> wrote:
+1 Integrity of the test suite and minimizing code churn top any benefits of more specific messages on failures. The expectation is that most tests will never fail so their changed messages will never be seen. For the rare cases when a test does fail, quite often the test was written in a way that will require examination of the code to understand exactly what the test case was intending to test and why it failed. Having a more specific exception message wouldn't help for many tests without further modifications; the key point is to know that the test failed. -- Ned Deily, nad@acm.org

On 16 February 2014 09:20, Ned Deily <nad@acm.org> wrote:
Right, there are a few key problems with large scale style changes to the test suite: 1. The worst case scenario is where we subtly change a test so that it is no longer testing what it is supposed to be testing, allowing the future introduction of an undetected regression. This isn't particularly *likely*, but a serious problem if it happens. 2. If there are pending patches for that module that include new tests, then the style change may cause the patches to no longer apply cleanly, require rework of bug fix and new feature patches to accommodate the style change. 3. Merges between branches may become more complicated (for reasons similar to 2), unless the style change is also applied to the maintenance branches (which is problematic due to 1). The practical benefits of this kind of change in the test suite are also highly dubious, because they *only help if the test fails at some point in the future*. At that point, whoever caused the test to fail will switch into debugging mode, and a couple of relevant points apply: * the cause of the problem may be immediately obvious anyway, since it's likely in whatever code they just changed * if they need more information, then they can refactor the failing test to use richer (and perhaps additional) assertions as part of the debugging exercise Now, debugging a failing test isn't a skill most programming courses teach, so it may be worth our while to actually provide some pointers on doing so effectively in the devguide (including a note about checking that the failing test is using rich assertions and potentially updating it if it isn't), but the general policy against large scale changes to take advantage of new features still applies to the test suite just as much as it does to the rest of the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16/02/2014 00:22, Nick Coghlan wrote:
Often a test failure is not caused by code I just wrote, but by adding a CPython test or removing a "skip if Jython", and not having written anything yet. While the irrefutable "False is not true" always raises a smile, I'd welcome something more informative. It's a more than a "style" issue. What Nick says above is also not false, as general guidance, but taken as an iron rule seem to argue against concurrent development at all. Don't we manage this change pretty well already? I see little risk of problems 1-3 in the actual proposal, as the changes themselves are 99% of the "drop-in replacement" type: - self.assertTrue(isinstance(x, int)) + self.assertIsInstance(x, int) I found few places, on a quick scan, that risked changing the meaning: they introduce an if-statement, or refactor the expression -- I don't mean they're actually wrong. The point about breaking Serhiy's patch into independent parts will help manage with merging and this risk. The tests are not library code, but their other use is as an example of good practice in unit testing. I pretty much get my idea of Python's test facilities from this work. It was a while before I realised more expressive methods were available. Jeff Jeff Allen

Jeff Allen, 16.02.2014 11:23:
Careful with such a bold statement. ;)
While I generally second Nick's objections to this, I also agree that the kind of change above is such an obvious and straight forward improvement (since unittest doesn't have py.test's assert analysis) that I'm +1 on applying them. I've been annoyed more than once by a test failure in CPython's test suite (when compiled with Cython) that required me to look up and read the test source and rerun it locally in order to actually understand what was happening. Seeing a more informative error message right on our CI server would certainly help, if only to get a quicker idea if this failure is worth taking a closer look at. Stefan

On Sun, Feb 16, 2014 at 12:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One place where those points don't apply so cleanly is when the test failure is coming from continuous integration and can't easily be reproduced locally (e.g., because there's a problem on a platform you don't have access to, or because it's some kind of threading-related intermittent failure that's exacerbated by the timing conditions on a particular machine). In those situations, an informative error message can easily save significant debugging time. Count me as +1 on the test updates, provided they're done carefully. (And all those I've looked at from Serhiy do indeed look careful.) -- Mark

On 2/15/2014 1:12 PM, Serhiy Storchaka wrote:
Failure of assertTrue reports 'False is not true'.
After thinking about Raymond's objections and checking http://docs.python.org/3/library/unittest.html#test-cases and noting Ezio's explicit approval and others tacit approval (by lack of objection), I think you should continue and finish. The reasons for not making making style changes in public stdlib modules are that we want people to import and use them, have promised stability absent notice otherwise, and because it is reasonably possible to make unintended semantic changes that might pass unnoticed by incomplete tests. The case is different on all counts for test.text_xyz modules. They are private application modules, not to be imported (except to run them), and subject to change without notice. Nearly all the changes proposed are from assertTrue to various specialized assertXs with documented equivalences to assertTrue. I presume you made the 1-to-1 replacements with a script using capturing regexes, with a check for false replacements such as might occasionally happen due to an operator being inside a string. If we are ever to make the replacements, doing so mechanically and in bulk is easier and safer than doing them one at a time by hand. This includes reviewing. -- Terry Jan Reedy

On Sat, 15 Feb 2014 20:12:33 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
When it comes specifically to test_bigmem, it is important for error messages to be informative, because the failures may be hard (if not enough RAM) or very long to diagnose on a developer's machine. So +1 to changing test_bigmem. As for the rest of the test suite, I find the "assertSpecific" form more readable that "assertTrue(... with some operator)". But I may be in a minority here :-) As for the "code churn" argument, I find that a much less important concern for the test suite than for the rest of the code. Regards Antoine.

In article <1392492250.26338.83831085.39A5ED08@webmail.messagingengine.com>, Benjamin Peterson <benjamin@python.org> wrote:
+1 Integrity of the test suite and minimizing code churn top any benefits of more specific messages on failures. The expectation is that most tests will never fail so their changed messages will never be seen. For the rare cases when a test does fail, quite often the test was written in a way that will require examination of the code to understand exactly what the test case was intending to test and why it failed. Having a more specific exception message wouldn't help for many tests without further modifications; the key point is to know that the test failed. -- Ned Deily, nad@acm.org

On 16 February 2014 09:20, Ned Deily <nad@acm.org> wrote:
Right, there are a few key problems with large scale style changes to the test suite: 1. The worst case scenario is where we subtly change a test so that it is no longer testing what it is supposed to be testing, allowing the future introduction of an undetected regression. This isn't particularly *likely*, but a serious problem if it happens. 2. If there are pending patches for that module that include new tests, then the style change may cause the patches to no longer apply cleanly, require rework of bug fix and new feature patches to accommodate the style change. 3. Merges between branches may become more complicated (for reasons similar to 2), unless the style change is also applied to the maintenance branches (which is problematic due to 1). The practical benefits of this kind of change in the test suite are also highly dubious, because they *only help if the test fails at some point in the future*. At that point, whoever caused the test to fail will switch into debugging mode, and a couple of relevant points apply: * the cause of the problem may be immediately obvious anyway, since it's likely in whatever code they just changed * if they need more information, then they can refactor the failing test to use richer (and perhaps additional) assertions as part of the debugging exercise Now, debugging a failing test isn't a skill most programming courses teach, so it may be worth our while to actually provide some pointers on doing so effectively in the devguide (including a note about checking that the failing test is using rich assertions and potentially updating it if it isn't), but the general policy against large scale changes to take advantage of new features still applies to the test suite just as much as it does to the rest of the standard library. Cheers, Nick. -- Nick Coghlan | ncoghlan@gmail.com | Brisbane, Australia

On 16/02/2014 00:22, Nick Coghlan wrote:
Often a test failure is not caused by code I just wrote, but by adding a CPython test or removing a "skip if Jython", and not having written anything yet. While the irrefutable "False is not true" always raises a smile, I'd welcome something more informative. It's a more than a "style" issue. What Nick says above is also not false, as general guidance, but taken as an iron rule seem to argue against concurrent development at all. Don't we manage this change pretty well already? I see little risk of problems 1-3 in the actual proposal, as the changes themselves are 99% of the "drop-in replacement" type: - self.assertTrue(isinstance(x, int)) + self.assertIsInstance(x, int) I found few places, on a quick scan, that risked changing the meaning: they introduce an if-statement, or refactor the expression -- I don't mean they're actually wrong. The point about breaking Serhiy's patch into independent parts will help manage with merging and this risk. The tests are not library code, but their other use is as an example of good practice in unit testing. I pretty much get my idea of Python's test facilities from this work. It was a while before I realised more expressive methods were available. Jeff Jeff Allen

Jeff Allen, 16.02.2014 11:23:
Careful with such a bold statement. ;)
While I generally second Nick's objections to this, I also agree that the kind of change above is such an obvious and straight forward improvement (since unittest doesn't have py.test's assert analysis) that I'm +1 on applying them. I've been annoyed more than once by a test failure in CPython's test suite (when compiled with Cython) that required me to look up and read the test source and rerun it locally in order to actually understand what was happening. Seeing a more informative error message right on our CI server would certainly help, if only to get a quicker idea if this failure is worth taking a closer look at. Stefan

On Sun, Feb 16, 2014 at 12:22 AM, Nick Coghlan <ncoghlan@gmail.com> wrote:
One place where those points don't apply so cleanly is when the test failure is coming from continuous integration and can't easily be reproduced locally (e.g., because there's a problem on a platform you don't have access to, or because it's some kind of threading-related intermittent failure that's exacerbated by the timing conditions on a particular machine). In those situations, an informative error message can easily save significant debugging time. Count me as +1 on the test updates, provided they're done carefully. (And all those I've looked at from Serhiy do indeed look careful.) -- Mark

On 2/15/2014 1:12 PM, Serhiy Storchaka wrote:
Failure of assertTrue reports 'False is not true'.
After thinking about Raymond's objections and checking http://docs.python.org/3/library/unittest.html#test-cases and noting Ezio's explicit approval and others tacit approval (by lack of objection), I think you should continue and finish. The reasons for not making making style changes in public stdlib modules are that we want people to import and use them, have promised stability absent notice otherwise, and because it is reasonably possible to make unintended semantic changes that might pass unnoticed by incomplete tests. The case is different on all counts for test.text_xyz modules. They are private application modules, not to be imported (except to run them), and subject to change without notice. Nearly all the changes proposed are from assertTrue to various specialized assertXs with documented equivalences to assertTrue. I presume you made the 1-to-1 replacements with a script using capturing regexes, with a check for false replacements such as might occasionally happen due to an operator being inside a string. If we are ever to make the replacements, doing so mechanically and in bulk is easier and safer than doing them one at a time by hand. This includes reviewing. -- Terry Jan Reedy

On Sat, 15 Feb 2014 20:12:33 +0200 Serhiy Storchaka <storchaka@gmail.com> wrote:
When it comes specifically to test_bigmem, it is important for error messages to be informative, because the failures may be hard (if not enough RAM) or very long to diagnose on a developer's machine. So +1 to changing test_bigmem. As for the rest of the test suite, I find the "assertSpecific" form more readable that "assertTrue(... with some operator)". But I may be in a minority here :-) As for the "code churn" argument, I find that a much less important concern for the test suite than for the rest of the code. Regards Antoine.
participants (9)
-
Antoine Pitrou
-
Benjamin Peterson
-
Jeff Allen
-
Mark Dickinson
-
Ned Deily
-
Nick Coghlan
-
Serhiy Storchaka
-
Stefan Behnel
-
Terry Reedy