Should we stop "yield"ing tests?
Hi all, Sorry for the novel of an e-mail! I wanted to be as detailed as possible, I hope it isn't too much. Right now our test suite does a lot of test yielding. Under this paradigm, every yielded test counts as a "dot" in the nose testing output and contributes to the number of tests printed at the end of the test suite. In addition, yielding tests makes it so that if a test function has a failing test, all of the tests yielded by that function will be run despite the fact that one or more tests failed. It's definitely nice to see that we're running thousands of tests, and I'd happy with continuing to do it, except I've learned that this approach adds some technical hurdles for people who actually need to deal with the test suite. Personally, I've noticed that it makes debugging failing tests much more annoying than it needs to be. Since a test is yielded, the traceback generated by a failing or erroring test ends up somewhere in nose rather than in the yt test suite function that actually yielded the test. This can make it difficult to determine where a failing test is coming from in the test suite, particularly if the test that gets yielded is just an assert. In addition, Kacper tells me that our practice of yielding tests makes it difficult to simplify our jenkins setup since yielded tests are not parallel-safe. I'd like to propose a modification to the yt codebase and developer guide. Rather than encouraging that all asserts and test classes be yielded, we should instead *not* yield them and just call them as regular functions or classes. To make that concrete, the following set of tests from the NMSU ART answer tests would go from looking like this: http://paste.yt-project.org/show/5944/ to looking like this: http://paste.yt-project.org/show/5946/ Each individual assert and test class instantiation would no longer count as an individual test in nose's test statistics. Instead, referring to the example above, the test_d9p function would be the only test reported by nose, even though the test function does many asserts and instantiates many test class instances. For me, the main win would be that it would be easier to determine which exact test failed, because the traceback reported by nose due to the test failure would include the line in the test file that produced a failing assert or caused an unhandled exception to be raised. To make that concrete, I've just made the following modification to the `test_dimensionless` function to force a test to fail: http://paste.yt-project.org/show/5947/ Running `nosetests units` in the root of the repository, I get the following traceback: http://paste.yt-project.org/show/5948/ Note how the test traceback does *not* include the `test_dimensionless` function. If I instead make it so the failing test is not yielded: http://paste.yt-project.org/show/5949/ I get a much nicer traceback: http://paste.yt-project.org/show/5950/ I can see a few reasons why we might not want to do this. 1. This is an invasive, large-scale change. It might be automatable (I think I could write an emacs macro to do this, for example), but it would in the end be difficult to review. 2. Since test functions would terminate on the first failure, it might lead to annoying debug cycles where one assert gets fixed but then the next assert fails, forcing people to wait for the full test suite to be rerun just to get to the next failing test. For the second point, I think we can remedy this just by improving our testing docs to encourage people to run tests locally as much as possible and also explain better how to run only a specific test function from the command line. If people are interested in making this modification globally, I think I could take it on as part of my general efforts to clean up the codebase. -Nathan
On 10/08/2015 11:11 AM, Nathan Goldbaum wrote:
Hi all,
Sorry for the novel of an e-mail! I wanted to be as detailed as possible, I hope it isn't too much.
Right now our test suite does a lot of test yielding. Under this paradigm, every yielded test counts as a "dot" in the nose testing output and contributes to the number of tests printed at the end of the test suite. In addition, yielding tests makes it so that if a test function has a failing test, all of the tests yielded by that function will be run despite the fact that one or more tests failed.
It's definitely nice to see that we're running thousands of tests, and I'd happy with continuing to do it, except I've learned that this approach adds some technical hurdles for people who actually need to deal with the test suite.
Personally, I've noticed that it makes debugging failing tests much more annoying than it needs to be. Since a test is yielded, the traceback generated by a failing or erroring test ends up somewhere in nose rather than in the yt test suite function that actually yielded the test. This can make it difficult to determine where a failing test is coming from in the test suite, particularly if the test that gets yielded is just an assert.
In addition, Kacper tells me that our practice of yielding tests makes it difficult to simplify our jenkins setup since yielded tests are not parallel-safe.
Just a small clarification: test generators and parellism in nose is currently completely broken[1]. AFAIK this issue was never fixed. Thread safety is a slightly different thing. Some test we have, which aren't generators, would also fail with parallel nose. [1] https://groups.google.com/forum/#!msg/nose-users/PHcGXlGQZMg/XKgUsDcyf7cJ
I'd like to propose a modification to the yt codebase and developer guide. Rather than encouraging that all asserts and test classes be yielded, we should instead *not* yield them and just call them as regular functions or classes. To make that concrete, the following set of tests from the NMSU ART answer tests would go from looking like this:
http://paste.yt-project.org/show/5944/
to looking like this:
http://paste.yt-project.org/show/5946/
Each individual assert and test class instantiation would no longer count as an individual test in nose's test statistics. Instead, referring to the example above, the test_d9p function would be the only test reported by nose, even though the test function does many asserts and instantiates many test class instances.
For me, the main win would be that it would be easier to determine which exact test failed, because the traceback reported by nose due to the test failure would include the line in the test file that produced a failing assert or caused an unhandled exception to be raised.
To make that concrete, I've just made the following modification to the `test_dimensionless` function to force a test to fail:
http://paste.yt-project.org/show/5947/
Running `nosetests units` in the root of the repository, I get the following traceback:
http://paste.yt-project.org/show/5948/
Note how the test traceback does *not* include the `test_dimensionless` function. If I instead make it so the failing test is not yielded:
http://paste.yt-project.org/show/5949/
I get a much nicer traceback:
http://paste.yt-project.org/show/5950/
I can see a few reasons why we might not want to do this.
1. This is an invasive, large-scale change. It might be automatable (I think I could write an emacs macro to do this, for example), but it would in the end be difficult to review.
2. Since test functions would terminate on the first failure, it might lead to annoying debug cycles where one assert gets fixed but then the next assert fails, forcing people to wait for the full test suite to be rerun just to get to the next failing test.
For the second point, I think we can remedy this just by improving our testing docs to encourage people to run tests locally as much as possible and also explain better how to run only a specific test function from the command line.
If people are interested in making this modification globally, I think I could take it on as part of my general efforts to clean up the codebase.
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
Hi Nathan, *Spolier alert* - I think you make a pretty strong case for ending the practice of yielding tests. In particular, if this makes it easier for developers to debug failing or erroring tests, I'm definitely on board. I agree that we should be encouraging people to run tests locally. I know I should be doing more of that. As for your points against doing this, I personally think they do not outweigh the benefits. The PR may be large, but it should be confined to testing files and the changes should all be very straightforward. Your second point against is probably the most compelling, but I think if one were to find themselves in a situation with a failing test, the best thing would be to just run that single test over and over again locally until it passed. To conclude my own novella of a response to your Ulysses, I am +1 on this. Britton On Thu, Oct 8, 2015 at 5:11 PM, Nathan Goldbaum <nathan12343@gmail.com> wrote:
Hi all,
Sorry for the novel of an e-mail! I wanted to be as detailed as possible, I hope it isn't too much.
Right now our test suite does a lot of test yielding. Under this paradigm, every yielded test counts as a "dot" in the nose testing output and contributes to the number of tests printed at the end of the test suite. In addition, yielding tests makes it so that if a test function has a failing test, all of the tests yielded by that function will be run despite the fact that one or more tests failed.
It's definitely nice to see that we're running thousands of tests, and I'd happy with continuing to do it, except I've learned that this approach adds some technical hurdles for people who actually need to deal with the test suite.
Personally, I've noticed that it makes debugging failing tests much more annoying than it needs to be. Since a test is yielded, the traceback generated by a failing or erroring test ends up somewhere in nose rather than in the yt test suite function that actually yielded the test. This can make it difficult to determine where a failing test is coming from in the test suite, particularly if the test that gets yielded is just an assert.
In addition, Kacper tells me that our practice of yielding tests makes it difficult to simplify our jenkins setup since yielded tests are not parallel-safe.
I'd like to propose a modification to the yt codebase and developer guide. Rather than encouraging that all asserts and test classes be yielded, we should instead *not* yield them and just call them as regular functions or classes. To make that concrete, the following set of tests from the NMSU ART answer tests would go from looking like this:
http://paste.yt-project.org/show/5944/
to looking like this:
http://paste.yt-project.org/show/5946/
Each individual assert and test class instantiation would no longer count as an individual test in nose's test statistics. Instead, referring to the example above, the test_d9p function would be the only test reported by nose, even though the test function does many asserts and instantiates many test class instances.
For me, the main win would be that it would be easier to determine which exact test failed, because the traceback reported by nose due to the test failure would include the line in the test file that produced a failing assert or caused an unhandled exception to be raised.
To make that concrete, I've just made the following modification to the `test_dimensionless` function to force a test to fail:
http://paste.yt-project.org/show/5947/
Running `nosetests units` in the root of the repository, I get the following traceback:
http://paste.yt-project.org/show/5948/
Note how the test traceback does *not* include the `test_dimensionless` function. If I instead make it so the failing test is not yielded:
http://paste.yt-project.org/show/5949/
I get a much nicer traceback:
http://paste.yt-project.org/show/5950/
I can see a few reasons why we might not want to do this.
1. This is an invasive, large-scale change. It might be automatable (I think I could write an emacs macro to do this, for example), but it would in the end be difficult to review.
2. Since test functions would terminate on the first failure, it might lead to annoying debug cycles where one assert gets fixed but then the next assert fails, forcing people to wait for the full test suite to be rerun just to get to the next failing test.
For the second point, I think we can remedy this just by improving our testing docs to encourage people to run tests locally as much as possible and also explain better how to run only a specific test function from the command line.
If people are interested in making this modification globally, I think I could take it on as part of my general efforts to clean up the codebase.
-Nathan
_______________________________________________ yt-dev mailing list yt-dev@lists.spacepope.org http://lists.spacepope.org/listinfo.cgi/yt-dev-spacepope.org
participants (3)
-
Britton Smith
-
Kacper Kowalik
-
Nathan Goldbaum