[Twisted-Python] Test coverage requirements
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
Hi, I'm looking at some <https://codecov.io/gh/twisted/twisted/pull/509> recent <https://codecov.io/gh/twisted/twisted/pull/432> trunk <https://codecov.io/gh/twisted/twisted/pull/652> commits <https://codecov.io/gh/twisted/twisted/pull/695> (also, others) that seem to have non-trivial untested code at at ReviewProcess <https://twistedmatrix.com/trac/wiki/ReviewProcess#Authors:Thingsyourbranchor...>. I can't tell if the codecov reports are wrong or if the development process documentation is wrong or if the commits just violate policy or (I guess) some mix of the three. Can anyone shed any light on this? Jean-Paul
![](https://secure.gravatar.com/avatar/3d38cd80c2089b2b28ce28ae48ad48f8.jpg?s=120&d=mm&r=g)
On 2017-02-26, at 23:51, Jean-Paul Calderone <exarkun@twistedmatrix.com> wrote:
I'm looking at some recent trunk commits (also, others) that seem to have non-trivial untested code at at ReviewProcess. I can't tell if the codecov reports are wrong or if the development process documentation is wrong or if the commits just violate policy or (I guess) some mix of the three.
Can anyone shed any light on this?
Jean-Paul, I'm the author of PR 652, one of the ones you pointed out. Starting with https://codecov.io/gh/twisted/twisted/pull/652 and navigating to https://codecov.io/gh/twisted/twisted/pulls/652/src/src/twisted/internet/uni... I'm not sure I understand what it is representing -- lines of code that tests did not cover, I guess. :) However, the diff for that file in the PR, at https://github.com/twisted/twisted/pull/652/files#diff-32f19fc001798d7ea3368... does not touch any of the lines highlighted by the those codecov links. Does this make any sense? (plain honest question, I'm not familiar with codecov other that using its output to guide missing test scenarios) At some point, like I commented in https://github.com/twisted/twisted/pull/652#issuecomment-276334447, codecov tests were failing, complaining that a single non-changed line within the -3/+3 line diff boundaries wasn't being covered. I found that strange then. One other thing I find strange is that trying to access any previous codecov test results from within that PR -- say the first run on Dec 30th -- ends up showing the same (or very similar?) report as the one you pointed out initially; in other words, complaining about lines that were not changed in any commit within that PR. Again, I'm not aware of the details of codecov's operation, but it certainly is no longer reporting the same things it did back when the PR was being worked on. Whatever it is reporting now, at least with regards to PR 652, does not seem to match the associated diff. Regards, -- exvito
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 26 February 2017 at 23:51, Jean-Paul Calderone <exarkun@twistedmatrix.com> wrote:
Besise https://codecov.io/gh/twisted/twisted/pulls/509/diff, the other PRs looks OK and with good coverage. In terms of codecov.io coverage I am using the "diff" view as I find it less confusing. The strong red lines are the ones which codecov considered that were part of the patch and were not covered. The yellow lines are for missing branch coverage in this patch... and they will decrease the coverage percentage. In summary: For PR 509 - The diff coverage is 74.39%. ... this looks bad For PR 652 - The diff coverage is 98.7% ... not perfect, but pretty good as the missing lines are in the test code area :) -------- At some point we had a commit status check for 100% diff coverage, but I remember that it was not popular. If we are not aiming for the ultimate quality dev process we can still consider having a lower barer of something like 75% or 95% diff coverage... For PR less of this value, you will see a big warning... but people from the Twisted admin team can still commit them. Simple (non-admin) collaborators can only merge them of the diff coverage is above that limit. I am for a check for 100% coverage as many times it helped me get out of comfort zone and "forced" to write better tests and better code ... I know that libertarians might not like being forced to do things so I understand why the hard check was not popular. ---------- For the good PRs which were mentioned here For https://codecov.io/gh/twisted/twisted/pulls/695/diff the only line without a coverage is the one in which the exception is raised if the reactor is installed twice. For https://codecov.io/gh/twisted/twisted/pulls/652/diff the uncovered lines are the case in which a tests is is not skipped on OSX (this is strange ... but maybe osx coverage are no longer published) ... and the other one is the self.fail which is expected not be be called. I think there was a discussion about covering self.fail() calls by extracting them into assertion helper and making sure we have tests for those assertion helpers.... so that we can ask all changes to also have 100% test code coverage :) For https://codecov.io/gh/twisted/twisted/pulls/432/diff there were some changes in formatting (some parenthesis) which were indented and that code was not previously covered ... so now it is considered code on which someone made a change without having the tests to cover the changes. -- Adi Roiban
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Thanks, JP. Your habit of post-hoc commit review really helps keep the process honest - I wish more folks (myself included) had more time for it! I wasn't party to most of these changes, except for 432, where I can say based on local testing I was very clear that coverage was not in fact an issue, and codecov was misreporting it. So I can't comment on most of them specifically. Generally though, I can say that Codecov, while more reliable than coveralls, is unfortunately really flaky. (Case in point, right now, clicking on all the URLs you posted here, I just see "AccessDeniedAccess Denied" followed by a long 64-bit identifier which I'm not pasting here for fear that it's some kind of github credential.) Its general unreliability is the main reason we removed the hard coverage requirement that blocked merging. That said, it has been improving and if it keeps improving at the rate it has been, I expect that we'd be able to put that coverage blocker back in in another 3-4 months. Perhaps something to talk about at PyCon. -glyph
![](https://secure.gravatar.com/avatar/869963bdf99b541c9f0bbfb04b0320f1.jpg?s=120&d=mm&r=g)
On Mon, 27 Feb 2017 at 21:54 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
I think at least one problem that we're suffering from here is our fault, rather than Codecov's: the coverage of the test suite is not stable due to non-determinism in the test suite. That is, the lines executed during a test run are not the same every time due to things like ordering / timing races / etc. This means that "changes" to coverage may show up for a particular PReven though nothing in that PR is actually responsible.
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Mon, Feb 27, 2017 at 6:00 PM, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
Changes to Twisted code which are only sometimes covered by the test suite sound like they would violate a 100% coverage rule. But I guess the experience of looking at a codecov report is so bad/confusing that it's not surprising authors/reviewers might fail to see what's going on and fix the non-deterministic. Particularly for code that requires coverage measurements on multiple platforms (ie, you basically *can't* do it locally), it seems like it would be easier (though, to be clear, *bad*) to just forget about it and hope everything is covered... A tool that pointed out coverage differences between multiple runs of the same version of the code would be a useful thing to start pointing out where these flaws in the Twisted test suite lie, right? And then each area could be given deterministic test coverage instead...
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
While this is certainly an issue, I don't think it's the issue we're discussing here. Unreliability of coverage is largely mitigated by the fact that the main thing we pay attention to is "patch coverage", which can be seen to fluctuate from commit to commit on a branch if the new test coverage is non-deterministic (and rarely is a PR an individual commit). This is opposed to "coverage delta", which only looks at coverage before / coverage after and is indeed somewhat unpredictable due to old / bad tests. So I can say when I've had to overrule codecov, it's almost never been because of flapping coverage lines outside of the patch under consideration (and the patches in consideration either have deterministic tests, or I ask the author to add them). General improvements to build reliability often reduce coverage unreliability as well, so as we've been using Github more, which surfaces status visibility / mergeability to reviewers more, we've been fixing lots of little build-reliability issues and this problem continues to get smaller. -glyph
![](https://secure.gravatar.com/avatar/3d38cd80c2089b2b28ce28ae48ad48f8.jpg?s=120&d=mm&r=g)
On 2017-02-26, at 23:51, Jean-Paul Calderone <exarkun@twistedmatrix.com> wrote:
I'm looking at some recent trunk commits (also, others) that seem to have non-trivial untested code at at ReviewProcess. I can't tell if the codecov reports are wrong or if the development process documentation is wrong or if the commits just violate policy or (I guess) some mix of the three.
Can anyone shed any light on this?
Jean-Paul, I'm the author of PR 652, one of the ones you pointed out. Starting with https://codecov.io/gh/twisted/twisted/pull/652 and navigating to https://codecov.io/gh/twisted/twisted/pulls/652/src/src/twisted/internet/uni... I'm not sure I understand what it is representing -- lines of code that tests did not cover, I guess. :) However, the diff for that file in the PR, at https://github.com/twisted/twisted/pull/652/files#diff-32f19fc001798d7ea3368... does not touch any of the lines highlighted by the those codecov links. Does this make any sense? (plain honest question, I'm not familiar with codecov other that using its output to guide missing test scenarios) At some point, like I commented in https://github.com/twisted/twisted/pull/652#issuecomment-276334447, codecov tests were failing, complaining that a single non-changed line within the -3/+3 line diff boundaries wasn't being covered. I found that strange then. One other thing I find strange is that trying to access any previous codecov test results from within that PR -- say the first run on Dec 30th -- ends up showing the same (or very similar?) report as the one you pointed out initially; in other words, complaining about lines that were not changed in any commit within that PR. Again, I'm not aware of the details of codecov's operation, but it certainly is no longer reporting the same things it did back when the PR was being worked on. Whatever it is reporting now, at least with regards to PR 652, does not seem to match the associated diff. Regards, -- exvito
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 26 February 2017 at 23:51, Jean-Paul Calderone <exarkun@twistedmatrix.com> wrote:
Besise https://codecov.io/gh/twisted/twisted/pulls/509/diff, the other PRs looks OK and with good coverage. In terms of codecov.io coverage I am using the "diff" view as I find it less confusing. The strong red lines are the ones which codecov considered that were part of the patch and were not covered. The yellow lines are for missing branch coverage in this patch... and they will decrease the coverage percentage. In summary: For PR 509 - The diff coverage is 74.39%. ... this looks bad For PR 652 - The diff coverage is 98.7% ... not perfect, but pretty good as the missing lines are in the test code area :) -------- At some point we had a commit status check for 100% diff coverage, but I remember that it was not popular. If we are not aiming for the ultimate quality dev process we can still consider having a lower barer of something like 75% or 95% diff coverage... For PR less of this value, you will see a big warning... but people from the Twisted admin team can still commit them. Simple (non-admin) collaborators can only merge them of the diff coverage is above that limit. I am for a check for 100% coverage as many times it helped me get out of comfort zone and "forced" to write better tests and better code ... I know that libertarians might not like being forced to do things so I understand why the hard check was not popular. ---------- For the good PRs which were mentioned here For https://codecov.io/gh/twisted/twisted/pulls/695/diff the only line without a coverage is the one in which the exception is raised if the reactor is installed twice. For https://codecov.io/gh/twisted/twisted/pulls/652/diff the uncovered lines are the case in which a tests is is not skipped on OSX (this is strange ... but maybe osx coverage are no longer published) ... and the other one is the self.fail which is expected not be be called. I think there was a discussion about covering self.fail() calls by extracting them into assertion helper and making sure we have tests for those assertion helpers.... so that we can ask all changes to also have 100% test code coverage :) For https://codecov.io/gh/twisted/twisted/pulls/432/diff there were some changes in formatting (some parenthesis) which were indented and that code was not previously covered ... so now it is considered code on which someone made a change without having the tests to cover the changes. -- Adi Roiban
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Thanks, JP. Your habit of post-hoc commit review really helps keep the process honest - I wish more folks (myself included) had more time for it! I wasn't party to most of these changes, except for 432, where I can say based on local testing I was very clear that coverage was not in fact an issue, and codecov was misreporting it. So I can't comment on most of them specifically. Generally though, I can say that Codecov, while more reliable than coveralls, is unfortunately really flaky. (Case in point, right now, clicking on all the URLs you posted here, I just see "AccessDeniedAccess Denied" followed by a long 64-bit identifier which I'm not pasting here for fear that it's some kind of github credential.) Its general unreliability is the main reason we removed the hard coverage requirement that blocked merging. That said, it has been improving and if it keeps improving at the rate it has been, I expect that we'd be able to put that coverage blocker back in in another 3-4 months. Perhaps something to talk about at PyCon. -glyph
![](https://secure.gravatar.com/avatar/869963bdf99b541c9f0bbfb04b0320f1.jpg?s=120&d=mm&r=g)
On Mon, 27 Feb 2017 at 21:54 Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
I think at least one problem that we're suffering from here is our fault, rather than Codecov's: the coverage of the test suite is not stable due to non-determinism in the test suite. That is, the lines executed during a test run are not the same every time due to things like ordering / timing races / etc. This means that "changes" to coverage may show up for a particular PReven though nothing in that PR is actually responsible.
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Mon, Feb 27, 2017 at 6:00 PM, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
Changes to Twisted code which are only sometimes covered by the test suite sound like they would violate a 100% coverage rule. But I guess the experience of looking at a codecov report is so bad/confusing that it's not surprising authors/reviewers might fail to see what's going on and fix the non-deterministic. Particularly for code that requires coverage measurements on multiple platforms (ie, you basically *can't* do it locally), it seems like it would be easier (though, to be clear, *bad*) to just forget about it and hope everything is covered... A tool that pointed out coverage differences between multiple runs of the same version of the code would be a useful thing to start pointing out where these flaws in the Twisted test suite lie, right? And then each area could be given deterministic test coverage instead...
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
While this is certainly an issue, I don't think it's the issue we're discussing here. Unreliability of coverage is largely mitigated by the fact that the main thing we pay attention to is "patch coverage", which can be seen to fluctuate from commit to commit on a branch if the new test coverage is non-deterministic (and rarely is a PR an individual commit). This is opposed to "coverage delta", which only looks at coverage before / coverage after and is indeed somewhat unpredictable due to old / bad tests. So I can say when I've had to overrule codecov, it's almost never been because of flapping coverage lines outside of the patch under consideration (and the patches in consideration either have deterministic tests, or I ask the author to add them). General improvements to build reliability often reduce coverage unreliability as well, so as we've been using Github more, which surfaces status visibility / mergeability to reviewers more, we've been fixing lots of little build-reliability issues and this problem continues to get smaller. -glyph
participants (5)
-
Adi Roiban
-
ex vito
-
Glyph Lefkowitz
-
Jean-Paul Calderone
-
Tristan Seligmann