[Twisted-Python] Coverage exceptions
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
Hi, Recently we have introduced a hard check of 100% coverage for all changes. This is done via coverage + codecov + github protected branches. Now, if your patch is not 100% covered github will not let you merge it. See for example this change: https://github.com/twisted/twisted/pull/261/files#diff-0fea8a8ca713deb7ea6a1... The errback is there to help with test failures ... but the test should never fail, so that errback is never called... and that line is not covered. How should we proceed with these changes? Maybe this is not the best example and that code could be refactored... but I think that the topic of ignoring missing coverage is still valid. I suggest to introduce ` # pragma: no cover` and update the coverage config with [report] exclude_lines = pragma: no cover ----------- What do you think? Thanks! -- Adi Roiban
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Thu, Jun 30, 2016 at 6:43 AM, Adi Roiban <adi@roiban.ro> wrote:
It doesn't always make sense to require 100% execution of all test code. It's not at all uncommon to only have code in a test suite that runs when a test fails. Historically, Twisted has never had a requirement of 100% execution of test code. The only test suite coverage requirements that have commonly been requested or enforced is for coverage of implementation code. I'd suggest removing the coverage enforcement for test suite code.
This seems like the wrong solution to me. It forces contributors to do extra work to mark their test code as an exception *and* provides a mechanism for incorrectly bypassing the check by using a no-cover pragma in implementation code. Jean-Paul
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
I am inclined to disagree, albeit mildly. When one is writing a deliberately un-covered path in test code, presumably, one is writing either a test helper - a mock, fake, or utility for setting up a real implementation - or an assertion method. Historically, I believe that when we've invested more heavily in making these utilities "real" themselves, and not just throwaway stuff inline in a test method or module, the benefits have far outweighed the costs. In fact the availability of proto_helpers is one of the selling points of Twisted as opposed to other competing engines. Therefore, I think that asking folks to add independent test coverage to verify their fakes and ensure that the failure-reporting of their assertion messages are helpful in the event a test fails is a generally good idea, and we should keep the requirement for 100% coverage on both test and implementation coverage. However, if there is contention around this, I'd much rather get a ratchet in place for implementation code that's reliable and everyone is happy with, so I'm OK with disabling coverage reporting for our *.test.* packages as a step towards that.
In any case I totally agree with this. If we have a categorical difference in types of code (test vs. non-test) then let's make that distinction, but we should not be adding one-off exceptions as an exercise of non-uniform reviewer judgement on every review. -glyph
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Thu, Jun 30, 2016 at 6:25 PM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
I mostly agree with this. However, I was thinking of a slightly different pattern when I wrote my earlier email. Here's are a couple (fictional) examples of that pattern one might find in unit tests for application code (and there's nothing Twisted-specific here): if foo: self.fail("Foo!") try: foo() except: bar else: self.fail("Foo :(") It's not exactly that this *can't* be code that's executed in a passing run of the test suite. It's more a question of what the right balance point is. If someone wants to generalize logic like this (and, fortunately, someone did generalize these *particular* examples - they're assertFalse and assertRaises, respectively) then that's great and the result is a higher level of confidence resulting from a successful run of the test suite. I'd suggest that if tests like these exercise all of the implementation code (on a successful run), though, then you've still achieved a pretty high level of test coverage and maybe further efforts are more productively directed elsewhere (increasing the coverage level of other implementation code in Twisted, for example :). If folks want a higher bar than this, I'm not going to argue (at least not much, at least not now). The bar *hasn't* been this high in the past though (and there are many many such cases to be found in Twisted's test suite right now and I don't have the impression this has ever been *much* of a source of problems).
I completely agree that fakes should be verified. So much so that I'm not even sure I believe in fakes in *general* anymore. Instead, you should just have easy to use interfaces and ship inexpensive implementations alongside whatever other implementations you also need. And all those implementations should have great test coverage. I also completely agree that when tests fail, they should do so in a meaningful way. I suspect slightly the implication that automated test coverage for the failure case demonstrates the failure is reported meaningfully, though. :) I think we're still stuck with relying on humans (authors, reviewers) to verify that property. Jean-Paul
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
Hi, The conversation about coverage exceptions came up due to this: https://github.com/twisted/twisted/pull/261 I have that up to 97.87% of patch coverage. I cannot merge it because the codecov integration with GitHub is currently configured to not accept patches which don't have 100% patch coverage. Can someone please take a look at that, and make a decision as to how to move that PR forward? Thanks. -- Craig
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
The problematic lines in question on this PR are here: https://codecov.io/gh/twisted/twisted/compare/f43b16c097309d9747a0c1e708bd4e... In this case, the fix to ensure they're covered is pretty easy. The problem here is essentially a defect in the test: it's returning a Deferred from the test method to spin the reactor, even though the test has already done the work of isolating the system under test from the real reactor (self.lock is pointed at self._clock and a glance at the implementation shows it doesn't access the global reactor). If you just fix the test to call self.successResultOf(d) after advancing the Clock to assert that it didn't fail, you can stop returning the Deferred, and remove the problematic un-covered callback. More generally - although the ability to return a Deferred from a test is a very useful bit of integration testing functionality, tests within Twisted itself returning Deferreds are usually just insufficiently isolated. We have a different way of doing integration tests with the reactors themselves (ReactorBuilder tests) which doesn't rely on returning Deferreds, and we don't have any tightly-coupled external systems like a database which it would make sense to rely upon in tests, so pretty much everything should be synchronous in our own test suite. -glyph
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
Hi, What decision should be made based on the feedback sent so far? Should we disabled the codecov coverage enforcement for 100% coverage for a patch as it also blocks missing coverage in a test? If we want to enforce only implementation code, then we need to update the tests to send separate reports for implementation and testing... and this is not done yet. Disabling/Enabling codecov.io merge protection is done here https://github.com/twisted-infra/braid/issues/213 -------- I would argue that testing code should have the same quality standards as the implementation code and hence also go for 100% coverage. It will help detect code which is never executed and which later might get out of sync or might break. This include mocked code which is out of sync or tests which are skipped on all builders and which will get out of sync and fail (ex our apidoc builder tests). It will also simplify the reporting infrastructure ... and we are already short-handed so a simple infrastructure should help move things forward much faster. -- Adi
![](https://secure.gravatar.com/avatar/869963bdf99b541c9f0bbfb04b0320f1.jpg?s=120&d=mm&r=g)
The tests directories can simply be excluded in coverage.py (or codecov), I don't think there's any need to do something more complicated than that. While I agree that 100% test coverage is an ideal worth aspiring to, I think getting there from the current state is going to be a large amount of work that yields very little benefit at this point in time; I would say that there are more important things to spend that effort on. On Sun, 3 Jul 2016 at 10:09 Adi Roiban <adi@roiban.ro> wrote:
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 3 July 2016 at 09:53, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
The tests directories can simply be excluded in coverage.py (or codecov), I don't think there's any need to do something more complicated than that.
That is, don't report coverage at all for the test code? I would prefer to see the coverage reports for tests, even if we don't enforce 100% coverage. Is a quick way to check that the test is executed on at least one builder. While I agree that 100% test coverage is an ideal worth aspiring to, I
It might yield (arguably) little benefit for existing contributors, but I think that for the new contributors it sends a bad message. For me, as a reviewing it helps a lot if a contributors can get a quick feedback about the quality of their code and I don't have to repeat in each review that a topfile is required of that the code needs tests... the topfile checker or coverage checker is red and this mean that a contributor needs to work more before requesting a review. We are preaching the Ultimate Quality Development System (UQDS) (tm) but when new contributors are onboarded they found out that they should ignore the tons of errors produced by twistedchecker, that the 100% coverage only applied for implementation code and that test code is a 2nd class citizen ... that the api docs will produce error even if you have created a correct cross reference to zope (similar to what existing code does)... etc But maybe is just my fault for taking the UQDS too seriously and instead I should read it as the Almost Ultimate Quality Development System. pyflakes run is not clean on trunk... and I hope that pydoctor will soon run without errors on trunk ... after that we can hit twistedchecker :) -- Adi
![](https://secure.gravatar.com/avatar/e81edff3af564b86f4c9d780a8023299.jpg?s=120&d=mm&r=g)
On Sun, Jul 3, 2016 at 8:13 AM, Adi Roiban <adi@roiban.ro> wrote:
I think this is definitely desirable.
We are preaching the Ultimate Quality Development System (UQDS) (tm) but [...]
Reading through https://twistedmatrix.com/trac/wiki/UltimateQualityDevelopmentSystem it doesn't actually say anything about linters or code coverage or warnings. That isn't t to say that the things you suggest are not valuable, just that calling a process that doesn't enforce all of those things absolutely all the time as "Almost Ultimate Quality Development System" is doing a disservice to the idea. I'll admit that I haven't read that as closely as I should, but re-reading it now, I see: "A reviewer reviews the completed work, and provides feedback: at least one good thing about the work, at least one area that needs improvement, and a judgement as to whether the good qualities ultimately outweigh the bad, i.e. whether the branch should be merged." which explicitly allows for imperfections in the code, as long as there is an overall improvement.
[...] that test code is a 2nd class citizen [...]
It isn't necessarily that test code is a second class citizen, but the purpose of the code is different that implementation code, so the trade-offs that make sense in each context might not be the same. -- Tom
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
On Jul 3, 2016, at 01:53, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
While I agree that 100% test coverage is an ideal worth aspiring to, I think getting there from the current state is going to be a large amount of work that yields very little benefit at this point in time; I would say that there are more important things to spend that effort on.
What we're talking about is requiring new patches to cover the test code that they change, not an instant bar for 100% coverage of all test code. Since this is being couched in terms of "effort" - it seems to me that this discussion alone is already more effort than just covering a few errant lines of test code here and there. :). For now, let's just bite the bullet and require 100% patch coverage from here on out. If we hit a really nasty case where it really is a significant investment of effort, then maybe we can revisit this discussion and explore a better way to express this exception without losing information about test coverage completely. -glyph
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
On Sun, Jul 3, 2016 at 9:10 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Requiring 100% patch coverage sounds reasonable. However, what if the infrastructure for running coverage and uploading reports to codecov.io isn't working? Running coverage under Pypy is apparently not working ( https://github.com/twisted/twisted/pull/223#issuecomment-228626722 ). This is blocking forward progress on patches to fix the Pypy tests. -- Craig
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 3 July 2016 at 20:32, Craig Rodrigues <rodrigc@crodrigues.org> wrote:
I have disabled codecov patch coverage for now as I think that codecov.io reporting is buggy. For coverage merge protection please see See https://github.com/twisted-infra/braid/issues/213 -- Adi Roiban
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
On Tue, Jul 5, 2016 at 5:18 AM, Adi Roiban <adi@roiban.ro> wrote:
I have disabled codecov patch coverage for now as I think that codecov.io
reporting is buggy.
For coverage merge protection please see See
https://github.com/twisted-infra/braid/issues/213
I'm still seeing problems with codecov. In this build: https://buildbot.twistedmatrix.com/builders/windows7-64-py2.7-coverage/build... uploading coverage results is failing: ==> Uploading .url https://codecov.io .query package=py2.0.5&yaml=codecov.yml&token=<secret>&build=windows64py27&branch=8503-markrwilliams-pypy-indicator&commit=9176793afb9ca8fcafaf4f3d518f6a4166deddb4 Pinging Codecov... Uploading to S3... Error: 403 Client Error: Forbidden for url: https://codecov.s3.amazonaws.com/v4/raw/2016-07-05/BB05435E28F8047E9B3D49B4B7BFA4B3/9176793afb9ca8fcafaf4f3d518f6a4166deddb4/10b1e59d-607a-4c5d-939d-6903844ba287.txt?AWSAccessKeyId=AKIAIHLZSCQCS4WIHD4A&Expires=1467755156&Signature=bmYejCo3NYpn7qPFLatb0RGydLk%3D I am not sure, but I think this is blocking coverage results from being reported in this PR: https://github.com/twisted/twisted/pull/223 -- Craig
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
This PR was recently rejected due to lack of test coverage (with no other feedback): https://twistedmatrix.com/trac/ticket/5705#comment:15. If we believe codecov is buggy, are we sure that this is actually the case, and it wasn't just a codecov bug? -glyph
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Hm. This pattern is exactly what I was thinking of though - as you point out, these examples did get generalized :-). In principle, I agree that a one-off example like this does not benefit from extensive refactoring to facilitate general use. But... in practice, every example of this that I've seen in a long-lived codebase eventually metastasizes into a repeated copy/paste pattern, or a big gross mixin that all tests practically need. Forcing everyone to deal with the problem sooner rather than later seems to have been a win on the few projects where I've gotten to practice it.
It's not exactly that this can't be code that's executed in a passing run of the test suite. It's more a question of what the right balance point is. If someone wants to generalize logic like this (and, fortunately, someone did generalize these particular examples - they're assertFalse and assertRaises, respectively) then that's great and the result is a higher level of confidence resulting from a successful run of the test suite. I'd suggest that if tests like these exercise all of the implementation code (on a successful run), though, then you've still achieved a pretty high level of test coverage and maybe further efforts are more productively directed elsewhere (increasing the coverage level of other implementation code in Twisted, for example :).
Speaking only from my direct experience here, adding good test helpers is a net reduction in effort very quickly; they pay for themselves on only the third test you write with them :). So I don't feel like this is much of a burden, but I'd be interested in hearing others' feedback here.
If folks want a higher bar than this, I'm not going to argue (at least not much, at least not now). The bar hasn't been this high in the past though (and there are many many such cases to be found in Twisted's test suite right now and I don't have the impression this has ever been much of a source of problems).
I wish it were easier to search my own reviews, because I think I've prodded people to factor this kind of stuff out, but I can't find any handy examples. But, I can see that right after this, I'm going to have to answer Craig's email, so perhaps we'll get to see a specific example :).
💯
And all those implementations should have great test coverage. I also completely agree that when tests fail, they should do so in a meaningful way. I suspect slightly the implication that automated test coverage for the failure case demonstrates the failure is reported meaningfully, though. :) I think we're still stuck with relying on humans (authors, reviewers) to verify that property.
Sure, rich and meaningful semantic error reporting is a human's job. But "doesn't just raise an exception and lose all the information about what happened due to an attribute spelling error on the rare race-y test that fails 1/10000 times" is a good first approximation, and tests are good at that sort of thing ;-). -glyph
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Thu, Jun 30, 2016 at 6:43 AM, Adi Roiban <adi@roiban.ro> wrote:
It doesn't always make sense to require 100% execution of all test code. It's not at all uncommon to only have code in a test suite that runs when a test fails. Historically, Twisted has never had a requirement of 100% execution of test code. The only test suite coverage requirements that have commonly been requested or enforced is for coverage of implementation code. I'd suggest removing the coverage enforcement for test suite code.
This seems like the wrong solution to me. It forces contributors to do extra work to mark their test code as an exception *and* provides a mechanism for incorrectly bypassing the check by using a no-cover pragma in implementation code. Jean-Paul
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
I am inclined to disagree, albeit mildly. When one is writing a deliberately un-covered path in test code, presumably, one is writing either a test helper - a mock, fake, or utility for setting up a real implementation - or an assertion method. Historically, I believe that when we've invested more heavily in making these utilities "real" themselves, and not just throwaway stuff inline in a test method or module, the benefits have far outweighed the costs. In fact the availability of proto_helpers is one of the selling points of Twisted as opposed to other competing engines. Therefore, I think that asking folks to add independent test coverage to verify their fakes and ensure that the failure-reporting of their assertion messages are helpful in the event a test fails is a generally good idea, and we should keep the requirement for 100% coverage on both test and implementation coverage. However, if there is contention around this, I'd much rather get a ratchet in place for implementation code that's reliable and everyone is happy with, so I'm OK with disabling coverage reporting for our *.test.* packages as a step towards that.
In any case I totally agree with this. If we have a categorical difference in types of code (test vs. non-test) then let's make that distinction, but we should not be adding one-off exceptions as an exercise of non-uniform reviewer judgement on every review. -glyph
![](https://secure.gravatar.com/avatar/607cfd4a5b41fe6c886c978128b9c03e.jpg?s=120&d=mm&r=g)
On Thu, Jun 30, 2016 at 6:25 PM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
I mostly agree with this. However, I was thinking of a slightly different pattern when I wrote my earlier email. Here's are a couple (fictional) examples of that pattern one might find in unit tests for application code (and there's nothing Twisted-specific here): if foo: self.fail("Foo!") try: foo() except: bar else: self.fail("Foo :(") It's not exactly that this *can't* be code that's executed in a passing run of the test suite. It's more a question of what the right balance point is. If someone wants to generalize logic like this (and, fortunately, someone did generalize these *particular* examples - they're assertFalse and assertRaises, respectively) then that's great and the result is a higher level of confidence resulting from a successful run of the test suite. I'd suggest that if tests like these exercise all of the implementation code (on a successful run), though, then you've still achieved a pretty high level of test coverage and maybe further efforts are more productively directed elsewhere (increasing the coverage level of other implementation code in Twisted, for example :). If folks want a higher bar than this, I'm not going to argue (at least not much, at least not now). The bar *hasn't* been this high in the past though (and there are many many such cases to be found in Twisted's test suite right now and I don't have the impression this has ever been *much* of a source of problems).
I completely agree that fakes should be verified. So much so that I'm not even sure I believe in fakes in *general* anymore. Instead, you should just have easy to use interfaces and ship inexpensive implementations alongside whatever other implementations you also need. And all those implementations should have great test coverage. I also completely agree that when tests fail, they should do so in a meaningful way. I suspect slightly the implication that automated test coverage for the failure case demonstrates the failure is reported meaningfully, though. :) I think we're still stuck with relying on humans (authors, reviewers) to verify that property. Jean-Paul
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
Hi, The conversation about coverage exceptions came up due to this: https://github.com/twisted/twisted/pull/261 I have that up to 97.87% of patch coverage. I cannot merge it because the codecov integration with GitHub is currently configured to not accept patches which don't have 100% patch coverage. Can someone please take a look at that, and make a decision as to how to move that PR forward? Thanks. -- Craig
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
The problematic lines in question on this PR are here: https://codecov.io/gh/twisted/twisted/compare/f43b16c097309d9747a0c1e708bd4e... In this case, the fix to ensure they're covered is pretty easy. The problem here is essentially a defect in the test: it's returning a Deferred from the test method to spin the reactor, even though the test has already done the work of isolating the system under test from the real reactor (self.lock is pointed at self._clock and a glance at the implementation shows it doesn't access the global reactor). If you just fix the test to call self.successResultOf(d) after advancing the Clock to assert that it didn't fail, you can stop returning the Deferred, and remove the problematic un-covered callback. More generally - although the ability to return a Deferred from a test is a very useful bit of integration testing functionality, tests within Twisted itself returning Deferreds are usually just insufficiently isolated. We have a different way of doing integration tests with the reactors themselves (ReactorBuilder tests) which doesn't rely on returning Deferreds, and we don't have any tightly-coupled external systems like a database which it would make sense to rely upon in tests, so pretty much everything should be synchronous in our own test suite. -glyph
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
Hi, What decision should be made based on the feedback sent so far? Should we disabled the codecov coverage enforcement for 100% coverage for a patch as it also blocks missing coverage in a test? If we want to enforce only implementation code, then we need to update the tests to send separate reports for implementation and testing... and this is not done yet. Disabling/Enabling codecov.io merge protection is done here https://github.com/twisted-infra/braid/issues/213 -------- I would argue that testing code should have the same quality standards as the implementation code and hence also go for 100% coverage. It will help detect code which is never executed and which later might get out of sync or might break. This include mocked code which is out of sync or tests which are skipped on all builders and which will get out of sync and fail (ex our apidoc builder tests). It will also simplify the reporting infrastructure ... and we are already short-handed so a simple infrastructure should help move things forward much faster. -- Adi
![](https://secure.gravatar.com/avatar/869963bdf99b541c9f0bbfb04b0320f1.jpg?s=120&d=mm&r=g)
The tests directories can simply be excluded in coverage.py (or codecov), I don't think there's any need to do something more complicated than that. While I agree that 100% test coverage is an ideal worth aspiring to, I think getting there from the current state is going to be a large amount of work that yields very little benefit at this point in time; I would say that there are more important things to spend that effort on. On Sun, 3 Jul 2016 at 10:09 Adi Roiban <adi@roiban.ro> wrote:
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 3 July 2016 at 09:53, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
The tests directories can simply be excluded in coverage.py (or codecov), I don't think there's any need to do something more complicated than that.
That is, don't report coverage at all for the test code? I would prefer to see the coverage reports for tests, even if we don't enforce 100% coverage. Is a quick way to check that the test is executed on at least one builder. While I agree that 100% test coverage is an ideal worth aspiring to, I
It might yield (arguably) little benefit for existing contributors, but I think that for the new contributors it sends a bad message. For me, as a reviewing it helps a lot if a contributors can get a quick feedback about the quality of their code and I don't have to repeat in each review that a topfile is required of that the code needs tests... the topfile checker or coverage checker is red and this mean that a contributor needs to work more before requesting a review. We are preaching the Ultimate Quality Development System (UQDS) (tm) but when new contributors are onboarded they found out that they should ignore the tons of errors produced by twistedchecker, that the 100% coverage only applied for implementation code and that test code is a 2nd class citizen ... that the api docs will produce error even if you have created a correct cross reference to zope (similar to what existing code does)... etc But maybe is just my fault for taking the UQDS too seriously and instead I should read it as the Almost Ultimate Quality Development System. pyflakes run is not clean on trunk... and I hope that pydoctor will soon run without errors on trunk ... after that we can hit twistedchecker :) -- Adi
![](https://secure.gravatar.com/avatar/e81edff3af564b86f4c9d780a8023299.jpg?s=120&d=mm&r=g)
On Sun, Jul 3, 2016 at 8:13 AM, Adi Roiban <adi@roiban.ro> wrote:
I think this is definitely desirable.
We are preaching the Ultimate Quality Development System (UQDS) (tm) but [...]
Reading through https://twistedmatrix.com/trac/wiki/UltimateQualityDevelopmentSystem it doesn't actually say anything about linters or code coverage or warnings. That isn't t to say that the things you suggest are not valuable, just that calling a process that doesn't enforce all of those things absolutely all the time as "Almost Ultimate Quality Development System" is doing a disservice to the idea. I'll admit that I haven't read that as closely as I should, but re-reading it now, I see: "A reviewer reviews the completed work, and provides feedback: at least one good thing about the work, at least one area that needs improvement, and a judgement as to whether the good qualities ultimately outweigh the bad, i.e. whether the branch should be merged." which explicitly allows for imperfections in the code, as long as there is an overall improvement.
[...] that test code is a 2nd class citizen [...]
It isn't necessarily that test code is a second class citizen, but the purpose of the code is different that implementation code, so the trade-offs that make sense in each context might not be the same. -- Tom
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
On Jul 3, 2016, at 01:53, Tristan Seligmann <mithrandi@mithrandi.net> wrote:
While I agree that 100% test coverage is an ideal worth aspiring to, I think getting there from the current state is going to be a large amount of work that yields very little benefit at this point in time; I would say that there are more important things to spend that effort on.
What we're talking about is requiring new patches to cover the test code that they change, not an instant bar for 100% coverage of all test code. Since this is being couched in terms of "effort" - it seems to me that this discussion alone is already more effort than just covering a few errant lines of test code here and there. :). For now, let's just bite the bullet and require 100% patch coverage from here on out. If we hit a really nasty case where it really is a significant investment of effort, then maybe we can revisit this discussion and explore a better way to express this exception without losing information about test coverage completely. -glyph
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
On Sun, Jul 3, 2016 at 9:10 AM, Glyph Lefkowitz <glyph@twistedmatrix.com> wrote:
Requiring 100% patch coverage sounds reasonable. However, what if the infrastructure for running coverage and uploading reports to codecov.io isn't working? Running coverage under Pypy is apparently not working ( https://github.com/twisted/twisted/pull/223#issuecomment-228626722 ). This is blocking forward progress on patches to fix the Pypy tests. -- Craig
![](https://secure.gravatar.com/avatar/c194a4d2f2f8269aa052942e87985198.jpg?s=120&d=mm&r=g)
On 3 July 2016 at 20:32, Craig Rodrigues <rodrigc@crodrigues.org> wrote:
I have disabled codecov patch coverage for now as I think that codecov.io reporting is buggy. For coverage merge protection please see See https://github.com/twisted-infra/braid/issues/213 -- Adi Roiban
![](https://secure.gravatar.com/avatar/469df05f5dfd5b75fb3cb3a0868d36bf.jpg?s=120&d=mm&r=g)
On Tue, Jul 5, 2016 at 5:18 AM, Adi Roiban <adi@roiban.ro> wrote:
I have disabled codecov patch coverage for now as I think that codecov.io
reporting is buggy.
For coverage merge protection please see See
https://github.com/twisted-infra/braid/issues/213
I'm still seeing problems with codecov. In this build: https://buildbot.twistedmatrix.com/builders/windows7-64-py2.7-coverage/build... uploading coverage results is failing: ==> Uploading .url https://codecov.io .query package=py2.0.5&yaml=codecov.yml&token=<secret>&build=windows64py27&branch=8503-markrwilliams-pypy-indicator&commit=9176793afb9ca8fcafaf4f3d518f6a4166deddb4 Pinging Codecov... Uploading to S3... Error: 403 Client Error: Forbidden for url: https://codecov.s3.amazonaws.com/v4/raw/2016-07-05/BB05435E28F8047E9B3D49B4B7BFA4B3/9176793afb9ca8fcafaf4f3d518f6a4166deddb4/10b1e59d-607a-4c5d-939d-6903844ba287.txt?AWSAccessKeyId=AKIAIHLZSCQCS4WIHD4A&Expires=1467755156&Signature=bmYejCo3NYpn7qPFLatb0RGydLk%3D I am not sure, but I think this is blocking coverage results from being reported in this PR: https://github.com/twisted/twisted/pull/223 -- Craig
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
This PR was recently rejected due to lack of test coverage (with no other feedback): https://twistedmatrix.com/trac/ticket/5705#comment:15. If we believe codecov is buggy, are we sure that this is actually the case, and it wasn't just a codecov bug? -glyph
![](https://secure.gravatar.com/avatar/e1554622707bedd9202884900430b838.jpg?s=120&d=mm&r=g)
Hm. This pattern is exactly what I was thinking of though - as you point out, these examples did get generalized :-). In principle, I agree that a one-off example like this does not benefit from extensive refactoring to facilitate general use. But... in practice, every example of this that I've seen in a long-lived codebase eventually metastasizes into a repeated copy/paste pattern, or a big gross mixin that all tests practically need. Forcing everyone to deal with the problem sooner rather than later seems to have been a win on the few projects where I've gotten to practice it.
It's not exactly that this can't be code that's executed in a passing run of the test suite. It's more a question of what the right balance point is. If someone wants to generalize logic like this (and, fortunately, someone did generalize these particular examples - they're assertFalse and assertRaises, respectively) then that's great and the result is a higher level of confidence resulting from a successful run of the test suite. I'd suggest that if tests like these exercise all of the implementation code (on a successful run), though, then you've still achieved a pretty high level of test coverage and maybe further efforts are more productively directed elsewhere (increasing the coverage level of other implementation code in Twisted, for example :).
Speaking only from my direct experience here, adding good test helpers is a net reduction in effort very quickly; they pay for themselves on only the third test you write with them :). So I don't feel like this is much of a burden, but I'd be interested in hearing others' feedback here.
If folks want a higher bar than this, I'm not going to argue (at least not much, at least not now). The bar hasn't been this high in the past though (and there are many many such cases to be found in Twisted's test suite right now and I don't have the impression this has ever been much of a source of problems).
I wish it were easier to search my own reviews, because I think I've prodded people to factor this kind of stuff out, but I can't find any handy examples. But, I can see that right after this, I'm going to have to answer Craig's email, so perhaps we'll get to see a specific example :).
💯
And all those implementations should have great test coverage. I also completely agree that when tests fail, they should do so in a meaningful way. I suspect slightly the implication that automated test coverage for the failure case demonstrates the failure is reported meaningfully, though. :) I think we're still stuck with relying on humans (authors, reviewers) to verify that property.
Sure, rich and meaningful semantic error reporting is a human's job. But "doesn't just raise an exception and lose all the information about what happened due to an attribute spelling error on the rare race-y test that fails 1/10000 times" is a good first approximation, and tests are good at that sort of thing ;-). -glyph
participants (6)
-
Adi Roiban
-
Craig Rodrigues
-
Glyph Lefkowitz
-
Jean-Paul Calderone
-
Tom Prince
-
Tristan Seligmann