[Twisted-Python] Some comments regarding #5190 - ``RFC 6125 ("Service Identity") implementation´´
Hello, I've just noticed that the changeset for #5190 included some untested code. Specifically, there are no tests for the code which detects missing dependencies and emits warnings about them. I'd previously noticed that this code was broken but hadn't realized this was because it was untested. I don't think there's any disagreement whatsoever over Twisted's testing requirements. All code must have full line and branch coverage (as reported by the coverage.py tool). Developers, please write tests for all of your code (and please learn to do test-driven development - it will make this task easier, I promise). Reviewers, please don't accept proposed changes that include untested code. Thanks, Jean-Paul
On Apr 30, 2014, at 5:21 AM, exarkun@twistedmatrix.com wrote:
I've just noticed that the changeset for #5190 included some untested code. Specifically, there are no tests for the code which detects missing dependencies and emits warnings about them.
My bad. Well, technically hawkowl's bad; hawkowl is a committer and did the review and therefore has all the criminal liability in this case, but as the author who wrote the code I bear some responsibility, at least in some abstract, hypothetical sense ;-). Thanks for working on the fix; it looks like the relevant ticket is <https://twistedmatrix.com/trac/ticket/7097>. I'll try to review that as soon as it's ready; let me know.
I'd previously noticed that this code was broken but hadn't realized this was because it was untested.
Neither the author nor the reviewer realized this either, apparently. Certainly it wasn't an intentional omission.
I don't think there's any disagreement whatsoever over Twisted's testing requirements. All code must have full line and branch coverage (as reported by the coverage.py tool). Developers, please write tests for all of your code (and please learn to do test-driven development - it will make this task easier, I promise). Reviewers, please don't accept proposed changes that include untested code.
The problem with code like this is that, in some configurations, it is in fact reported as covered by coverage.py. It requires manual examination to get the intersection of a diff and a coverage report, and even when you do, we still have too many places where it's "okay" to skip coverage. As the author I looked at coverage periodically and it looked sort of like what I expected. Since I was testing multiple installed-library configurations I had used "coverage combine" which misleadingly told me that it was all covered (although this particular code should have been tested independently without requiring a combined run). And I'm sure the reviewer thought about it a little bit, but even if they'd looked at a coverage report, it might have looked like it was OK to skip these particular lines. And I was in fact doing test-driven development; I didn't add the warning code there until I was looking at a failing test because one of the buildbots didn't have one of my expected dependencies installed, and I made my tests pass locally by having an environment without those dependencies installed locally either. Yes, I understand how this isn't really 100% TDD, and that a failure on a buildbot should have resulted in me writing a new test; mistakes were made etc. But all TDD necessarily involves the occasional error/error/pass where there really ought to have been a pass/fail/pass - if we understood what was going on with all of our code all the time we probably wouldn't need tests in the first place :-). It's a bit disingenuous to say that I need to "learn to do test-driven development" to avoid mistakes like this, though. On the other side of the equation, I imagine that a reviewer looking at this, even carefully considering coverage, might see a missed line on some buildbot or in their local run and then thought "oh, of course, but that line will be run if I had/didn't have that library installed". And there are some bits of code which are acceptable to cover in this manner (except they should have direct test coverage from actual tests, rather than just importing the test module, which coverage.py won't show you). It's a quite subtle point to understand that this particular kind of code should actually be fully covered in all configurations. Especially because these tests are smack in the middle of a file which will be validly missing coverage in some supported configurations (no pyOpenSSL installed) and surrounded with thickets of conditionals and test skips to optionally import more dependencies than just this one. We should remain vigilant, but I think that if we want to really reduce errors like this in the future we need to make them easier to spot. Failing that we need to have more specific suggestions. In this case, I happen to know that I do TDD and that Hawkowl was is aware of the standard on coverage issues (and is at least aware of coverage.py, whether or not it was run as part of this review), so those two suggestions aren't going to help as we're already doing them. Any time the solution to a problem is "everybody should just try harder" that seems like a bet against human fallibility. So until someone has a month to spend on an all-singing all-dancing combined ratcheting coverage report for all the builders and a fantastic visualization for its output which highlights every possible coverage issue, here are some specific suggestions which might avoid some parts of this class of error: For authors (what I could have done better): I know I said they're inevitable, but whenever you get an error/pass, always consider where you could make it a clean fail/pass instead. You (and by "you" I obviously mean "me") think you understand why an error happened but the only way to really demonstrate you understand it well enough to convert it into an assertion that fails with a useful error message. Be intensely suspicious of any code that needs to run at import time. I did stuff the warning into a function, which at least doesn't leak local variables, but I probably could have moved this warning somewhere easier to manage, and would have noticed warnings coming out of tests as opposed to just being printed at the beginning. Declarative like deprecatedModuleAttribute automate some of the magic for making code-level artifacts emit warnings when bound to and used rather than accidents of their initial import, so make use of those. (I'm still thinking about how I could have applied that in this specific case; I probably could have.) Configure your development environment to be more aggressive about warnings (at least for now, eventually trial should fix this for you, see <https://twistedmatrix.com/trac/ticket/6348>). I don't think it would have helped in this particular case because the warning itself is emitted at import time (see point 2) but this sort of mistake crops up unfortunately frequently related to deprecation warnings, which are a bit more common, and could often be caught by a better setup. I recently changed my PYTHONWARNINGS environment variable to 'all::DeprecationWarning,all::UserWarning', and that seems to catch most things. (Unfortunately setting it to simply 'all' produces too much noise from the stdlib and dependencies so it's better to be slightly more restrictive.) For reviewers (what hawkowl could have done better): Run coverage. Particularly, run coverage just on the relevant and changed test modules, and make sure the system under test gets run directly and just accidentally executed by running the code. I know I've been reminding reviewers lately to give clear feedback about what elements of reviews are suggestions and which are required fixes for violations of policy, and that may produce the subjective impression that I've been asking for faster or less careful reviews. If so, I should correct that impression: I would like there to be less bike shedding, but it's still pretty important that the £10 million reactor actually work. Any lack of test coverage is at least a potential policy violation. Even if you think you understand why it's missing, even if it looks like a platform variance that doesn't make sense to test on the machine you're running, always ask the author to explain or justify why coverage isn't there, if it could be added to a cross-platform test with a reasonable (or, in many cases, even an existing) fake; if there's no relevant fake and it would be too much work, maybe we need to file a ticket for implementing some test support. Especially if you're dealing with a new feature or a significant behavior change, always try to actually run and interact with the code and look at its output. In this case, noticing the whitespace / formatting errors in the warning messages might have lead us to spot the coverage error earlier. (Jean-Paul made some comments to me when he noticed it, but it was an off-the-cuff thing after the branch had already been landed and not part of a code review; context is important here, as evidenced by the fact that it took him some time to realize that it was indicative of a test coverage issue! If anyone else has any ideas. -glyph
On 1 May, 07:23 pm, glyph@twistedmatrix.com wrote:
On Apr 30, 2014, at 5:21 AM, exarkun@twistedmatrix.com wrote:
I've just noticed that the changeset for #5190 included some untested code. Specifically, there are no tests for the code which detects missing dependencies and emits warnings about them.
My bad. Well, technically hawkowl's bad; hawkowl is a committer and did the review and therefore has all the criminal liability in this case, but as the author who wrote the code I bear some responsibility, at least in some abstract, hypothetical sense ;-).
My hope is that by drawing attention to examples of this kind of mistake will help us avoid making the mistake in the future. Considering what my email prompted you to write, I think it may work. :)
Thanks for working on the fix; it looks like the relevant ticket is <https://twistedmatrix.com/trac/ticket/7097>. I'll try to review that as soon as it's ready; let me know.
No problem. I probably should have started my previous email with thanks to you and hawkowl for working on that feature. It is *really* good to have service identity checking support in Twisted.
I'd previously noticed that this code was broken but hadn't realized this was because it was untested.
Neither the author nor the reviewer realized this either, apparently. Certainly it wasn't an intentional omission.
I don't think there's any disagreement whatsoever over Twisted's testing requirements. All code must have full line and branch coverage (as reported by the coverage.py tool). Developers, please write tests for all of your code (and please learn to do test-driven development - it will make this task easier, I promise). Reviewers, please don't accept proposed changes that include untested code.
The problem with code like this is that, in some configurations, it is in fact reported as covered by coverage.py. It requires manual examination to get the intersection of a diff and a coverage report, and even when you do, we still have too many places where it's "okay" to skip coverage.
This is true - but I'm not sure the code in this case is particularly special. It's nearly always possible to write code and tests such that coverage.py says your code is covered but without actually having any meaningful test coverage of the implementation. After all, coverage.py only knows that a line ran or didn't. The problem of platform- or environment-specific code requiring multiple branches which can never always all run is an extra challenge but I think a widely applicable solution is to not do that. To add to your comments below, if there is platform- or environment-specific code then parameterize it on the environment and write tests for all of the cases.
As the author I looked at coverage periodically and it looked sort of like what I expected. Since I was testing multiple installed-library configurations I had used "coverage combine" which misleadingly told me that it was all covered (although this particular code should have been tested independently without requiring a combined run). And I'm sure the reviewer thought about it a little bit, but even if they'd looked at a coverage report, it might have looked like it was OK to skip these particular lines. And I was in fact doing test-driven development; I didn't add the warning code there until I was looking at a failing test because one of the buildbots didn't have one of my expected dependencies installed, and I made my tests pass locally by having an environment without those dependencies installed locally either.
Yes, I understand how this isn't really 100% TDD, and that a failure on a buildbot should have resulted in me writing a new test; mistakes were made etc. But all TDD necessarily involves the occasional error/error/pass where there really ought to have been a pass/fail/pass - if we understood what was going on with all of our code all the time we probably wouldn't need tests in the first place :-). It's a bit disingenuous to say that I need to "learn to do test-driven development" to avoid mistakes like this, though.
On the other side of the equation, I imagine that a reviewer looking at this, even carefully considering coverage, might see a missed line on some buildbot or in their local run and then thought "oh, of course, but that line will be run if I had/didn't have that library installed". And there are some bits of code which are acceptable to cover in this manner (except they should have direct test coverage from actual tests, rather than just importing the test module, which coverage.py won't show you). It's a quite subtle point to understand that this particular kind of code should actually be fully covered in all configurations. Especially because these tests are smack in the middle of a file which will be validly missing coverage in some supported configurations (no pyOpenSSL installed) and surrounded with thickets of conditionals and test skips to optionally import more dependencies than just this one.
We should remain vigilant, but I think that if we want to really reduce errors like this in the future we need to make them easier to spot. Failing that we need to have more specific suggestions. In this case, I happen to know that I do TDD and that Hawkowl was is aware of the standard on coverage issues (and is at least aware of coverage.py, whether or not it was run as part of this review), so those two suggestions aren't going to help as we're already doing them. Any time the solution to a problem is "everybody should just try harder" that seems like a bet against human fallibility.
So until someone has a month to spend on an all-singing all-dancing combined ratcheting coverage report for all the builders and a fantastic visualization for its output which highlights every possible coverage issue, here are some specific suggestions which might avoid some parts of this class of error:
I don't think we even have a plan for a tool that will report whether a change introduces code that isn't *really* tested (contrast "tested" with "executed"). I think this may be an area where we do actually need to rely on people doing a good job. Perhaps to counter balance that we need to eliminate more of the other crap involved in the development process? For example, if reviewers never had to spend any time thinking about whether the whitespace in a change was correct, they would have that much more brain power to apply to assessing the quality of the test suite.
For authors (what I could have done better):
I know I said they're inevitable, but whenever you get an error/pass, always consider where you could make it a clean fail/pass instead. You (and by "you" I obviously mean "me") think you understand why an error happened but the only way to really demonstrate you understand it well enough to convert it into an assertion that fails with a useful error message. Be intensely suspicious of any code that needs to run at import time. I did stuff the warning into a function, which at least doesn't leak local variables, but I probably could have moved this warning somewhere easier to manage, and would have noticed warnings coming out of tests as opposed to just being printed at the beginning. Declarative like deprecatedModuleAttribute automate some of the magic for making code- level artifacts emit warnings when bound to and used rather than accidents of their initial import, so make use of those. (I'm still thinking about how I could have applied that in this specific case; I probably could have.) Configure your development environment to be more aggressive about warnings (at least for now, eventually trial should fix this for you, see <https://twistedmatrix.com/trac/ticket/6348>). I don't think it would have helped in this particular case because the warning itself is emitted at import time (see point 2) but this sort of mistake crops up unfortunately frequently related to deprecation warnings, which are a bit more common, and could often be caught by a better setup. I recently changed my PYTHONWARNINGS environment variable to 'all::DeprecationWarning,all::UserWarning', and that seems to catch most things. (Unfortunately setting it to simply 'all' produces too much noise from the stdlib and dependencies so it's better to be slightly more restrictive.)
For reviewers (what hawkowl could have done better):
Run coverage. Particularly, run coverage just on the relevant and changed test modules, and make sure the system under test gets run directly and just accidentally executed by running the code. I know I've been reminding reviewers lately to give clear feedback about what elements of reviews are suggestions and which are required fixes for violations of policy, and that may produce the subjective impression that I've been asking for faster or less careful reviews. If so, I should correct that impression: I would like there to be less bike shedding, but it's still pretty important that the £10 million reactor actually work. Any lack of test coverage is at least a potential policy violation. Even if you think you understand why it's missing, even if it looks like a platform variance that doesn't make sense to test on the machine you're running, always ask the author to explain or justify why coverage isn't there, if it could be added to a cross-platform test with a reasonable (or, in many cases, even an existing) fake; if there's no relevant fake and it would be too much work, maybe we need to file a ticket for implementing some test support. Especially if you're dealing with a new feature or a significant behavior change, always try to actually run and interact with the code and look at its output. In this case, noticing the whitespace / formatting errors in the warning messages might have lead us to spot the coverage error earlier. (Jean-Paul made some comments to me when he noticed it, but it was an off-the-cuff thing after the branch had already been landed and not part of a code review; context is important here, as evidenced by the fact that it took him some time to realize that it was indicative of a test coverage issue!
Thanks! These are great suggestions. Can we record them in a way that lets all Twisted contributors learn from this case (instead of only the people reading this thread) - but without adding to the already unreasonably large quantity of text new contributors are ostensibly already responsible for reading? How's the unified Contributing-to-Twisted documentation effort coming? Jean-Paul
On May 3, 2014, at 5:20 AM, exarkun@twistedmatrix.com wrote:
On 1 May, 07:23 pm, glyph@twistedmatrix.com wrote:
On Apr 30, 2014, at 5:21 AM, exarkun@twistedmatrix.com wrote:
I've just noticed that the changeset for #5190 included some untested code. Specifically, there are no tests for the code which detects missing dependencies and emits warnings about them.
My bad. Well, technically hawkowl's bad; hawkowl is a committer and did the review and therefore has all the criminal liability in this case, but as the author who wrote the code I bear some responsibility, at least in some abstract, hypothetical sense ;-).
My hope is that by drawing attention to examples of this kind of mistake will help us avoid making the mistake in the future. Considering what my email prompted you to write, I think it may work. :)
Mission accomplished, I guess :-). Please continue doing so. I wish that every commit to trunk would prompt a thread on this list, really. Despite the epically ridiculous amount of email I get, this list is still a bit too low-traffic for my taste.
Thanks for working on the fix; it looks like the relevant ticket is <https://twistedmatrix.com/trac/ticket/7097>. I'll try to review that as soon as it's ready; let me know.
No problem. I probably should have started my previous email with thanks to you and hawkowl for working on that feature. It is *really* good to have service identity checking support in Twisted.
Thanks for saying so.
The problem with code like this is that, in some configurations, it is in fact reported as covered by coverage.py. It requires manual examination to get the intersection of a diff and a coverage report, and even when you do, we still have too many places where it's "okay" to skip coverage.
This is true - but I'm not sure the code in this case is particularly special. It's nearly always possible to write code and tests such that coverage.py says your code is covered but without actually having any meaningful test coverage of the implementation. After all, coverage.py only knows that a line ran or didn't.
The problem of platform- or environment-specific code requiring multiple branches which can never always all run is an extra challenge but I think a widely applicable solution is to not do that. To add to your comments below, if there is platform- or environment-specific code then parameterize it on the environment and write tests for all of the cases.
I think what really makes this an extra challenge is that we (well, all Python programmers, really) have a smattering of different idioms for cases like this and we don't have a succinct way of expressing the optional dependency both in the import, the implementation, and the tests.
So until someone has a month to spend on an all-singing all-dancing combined ratcheting coverage report for all the builders and a fantastic visualization for its output which highlights every possible coverage issue, here are some specific suggestions which might avoid some parts of this class of error:
I don't think we even have a plan for a tool that will report whether a change introduces code that isn't *really* tested (contrast "tested" with "executed").
Indeed not. So we need some ideas in the meanwhile :-).
I think this may be an area where we do actually need to rely on people doing a good job. Perhaps to counter balance that we need to eliminate more of the other crap involved in the development process? For example, if reviewers never had to spend any time thinking about whether the whitespace in a change was correct, they would have that much more brain power to apply to assessing the quality of the test suite.
Speaking of thanking people for things, we should also thank richard wall, david reid, you, and hawkowl for maintenance of twistedchecker. It's a lot better than it was :).
Thanks! These are great suggestions. Can we record them in a way that lets all Twisted contributors learn from this case (instead of only the people reading this thread) - but without adding to the already unreasonably large quantity of text new contributors are ostensibly already responsible for reading?
Those sound like diametrically opposed ideas :-).
How's the unified Contributing-to-Twisted documentation effort coming?
Yeah, how is that coming?
participants (3)
-
exarkun@twistedmatrix.com
-
Glyph
-
Glyph Lefkowitz