Hi,
I tried to merge a pull request on my phone, but I got the error:
"Pull requests that have a failing status can’t be merged on a phone."
The GitHub PEP announced that it will be possible to merge a change from the beach. Well, it's doable but only if you bring a laptop, not a phone :-)
All tests pass except Codecov which is unstable. On a computer, I can merge such PR.
What is the status of Codecov? Is someone actively working on fixing it to make it more reliable. I dislike code coverage in general, even more when it's run in a CI.
Can we change the merge policy to allow merge on a phone? Or can we fix Codecov?
Note: the PR is https://github.com/python/cpython/pull/1237#pullrequestreview-34044762
Codecov says "10% of diff hit (target: 100%)". The newly added code is tested on Windows on release build. Maybe Codecov only tests on Windows?
I dislike code coverage because there is a temptation to write artficial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
Victor
Ah, I found a workaround: Firefox on Android has a "[x] See the computer version" option which allows the merge!?
Victor
Le 22 avr. 2017 12:29 AM, "Victor Stinner" <victor.stinner@gmail.com> a écrit :
Hi,
I tried to merge a pull request on my phone, but I got the error:
"Pull requests that have a failing status can’t be merged on a phone."
The GitHub PEP announced that it will be possible to merge a change from the beach. Well, it's doable but only if you bring a laptop, not a phone :-)
All tests pass except Codecov which is unstable. On a computer, I can merge such PR.
What is the status of Codecov? Is someone actively working on fixing it to make it more reliable. I dislike code coverage in general, even more when it's run in a CI.
Can we change the merge policy to allow merge on a phone? Or can we fix Codecov?
Note: the PR is https://github.com/python/cpython/pull/1237#pullrequestreview-34044762
Codecov says "10% of diff hit (target: 100%)". The newly added code is tested on Windows on release build. Maybe Codecov only tests on Windows?
I dislike code coverage because there is a temptation to write artficial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
Victor
On Fri, 21 Apr 2017 at 15:33 Victor Stinner <victor.stinner@gmail.com> wrote:
Ah, I found a workaround: Firefox on Android has a "[x] See the computer version" option which allows the merge!?
Victor
Le 22 avr. 2017 12:29 AM, "Victor Stinner" <victor.stinner@gmail.com> a écrit :
Hi,
I tried to merge a pull request on my phone, but I got the error:
"Pull requests that have a failing status can’t be merged on a phone."
Well that's annoying.
The GitHub PEP announced that it will be possible to merge a change from the beach. Well, it's doable but only if you bring a laptop, not a phone :-)
All tests pass except Codecov which is unstable. On a computer, I can merge such PR.
What is the status of Codecov? Is someone actively working on fixing it to make it more reliable.
https://github.com/python/core-workflow/issues/38 is tracking trying to fix the inconsistency problem and https://github.com/python/core-workflow/issues/18 is tracking getting a more complete coverage report.
I would like to see this fixed and it's on my workflow todo list, but obviously that list is not short so I don't know when I will get to it (especially since at some point I will need to take a workflow break and actually write code and review PRs again ;) .
I dislike code coverage in general, even more when it's run in a CI.
Can we change the merge policy to allow merge on a phone? Or can we fix Codecov?
We might be able to turn off the status check. The config file is https://github.com/python/cpython/blob/master/.codecov.yml and the docs are at https://docs.codecov.io/docs/codecov-yaml (I'm on vacation so I don't have time to look up the exact config change since I really shouldn't even be replying to this email ;) .
Note: the PR is https://github.com/python/cpython/pull/1237#pullrequestreview-34044762
Codecov says "10% of diff hit (target: 100%)". The newly added code is tested on Windows on release build. Maybe Codecov only tests on Windows?
Codecov actually only runs under Travis, so it's only testing on Linux.
I dislike code coverage because there is a temptation to write artficial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
I personally disagree as code that isn't tested isn't checked that at least some semblance of backwards-compatibility is being kept. Now I'm not advocating we must have explicit tests for every line or code, but I think we should somehow exercise as much code as possible.
Oh, I forgot something about Codecov: it took me 2 minutes to understand why a PR gets the red icon whereas all tests pass and the merge button was waiting for my click. In fact, Codecov failed but the test isn't blocking. I would expect the green icon on the overall list of PR.
Well, it's not blocking, just surprising the first time.
It may be annoying when you really want to list PR where tests fail, for example to reschedule Travis jobs if we fix a bug in master (which caused a test failure).
Victor
Thank you, I will take a look and see if I can help.
Victor
Le 22 avr. 2017 6:43 PM, "Brett Cannon" <brett@python.org> a écrit :
On Fri, 21 Apr 2017 at 15:33 Victor Stinner <victor.stinner@gmail.com> wrote:
Ah, I found a workaround: Firefox on Android has a "[x] See the computer version" option which allows the merge!?
Victor
Le 22 avr. 2017 12:29 AM, "Victor Stinner" <victor.stinner@gmail.com> a écrit :
Hi,
I tried to merge a pull request on my phone, but I got the error:
"Pull requests that have a failing status can’t be merged on a phone."
Well that's annoying.
The GitHub PEP announced that it will be possible to merge a change from the beach. Well, it's doable but only if you bring a laptop, not a phone :-)
All tests pass except Codecov which is unstable. On a computer, I can merge such PR.
What is the status of Codecov? Is someone actively working on fixing it to make it more reliable.
https://github.com/python/core-workflow/issues/38 is tracking trying to fix the inconsistency problem and https://github.com/python/ core-workflow/issues/18 is tracking getting a more complete coverage report.
I would like to see this fixed and it's on my workflow todo list, but obviously that list is not short so I don't know when I will get to it (especially since at some point I will need to take a workflow break and actually write code and review PRs again ;) .
I dislike code coverage in general, even more when it's run in a CI.
Can we change the merge policy to allow merge on a phone? Or can we fix Codecov?
We might be able to turn off the status check. The config file is https://github.com/python/cpython/blob/master/.codecov.yml and the docs are at https://docs.codecov.io/docs/codecov-yaml (I'm on vacation so I don't have time to look up the exact config change since I really shouldn't even be replying to this email ;) .
Note: the PR is https://github.com/python/cpython/pull/1237#pullrequestreview-34044762
Codecov says "10% of diff hit (target: 100%)". The newly added code is tested on Windows on release build. Maybe Codecov only tests on Windows?
Codecov actually only runs under Travis, so it's only testing on Linux.
I dislike code coverage because there is a temptation to write artficial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
I personally disagree as code that isn't tested isn't checked that at least some semblance of backwards-compatibility is being kept. Now I'm not advocating we must have explicit tests for every line or code, but I think we should somehow exercise as much code as possible.
On 04/21/2017 03:29 PM, Victor Stinner wrote:
I dislike code coverage because there is a temptation to write artficial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
If it's not important enough to require tests it's not important enough to be in Python. ;)
-- ~Ethan~
On Apr 24, 2017, at 09:32 PM, Ethan Furman wrote:
I dislike code coverage because there is a temptation to write artficial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
If it's not important enough to require tests it's not important enough to be in Python. ;)
"Untested code is broken code" :)
-Barry
On 4/25/2017 11:00 AM, Barry Warsaw wrote:
On Apr 24, 2017, at 09:32 PM, Ethan Furman wrote:
On 04/21/2017 03:29 PM, Victor Stinner wrote:
(In the context of having a patch blocked by the blind Codecov robot ...)
I dislike code coverage because there is a temptation to write artificial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
While I use code coverage to improve automated unittesting, I am opposed to turning a usable but limited and sometime faulty tool into a blind robotic master that blocks improvements. The prospect of this being done has discouraged me from learning the new system. (More on 'faulty tool' later.) The temptation to write artificial tests to satisfy an artificial goal is real. Doing so can eat valuable time better used for something else. For instance: def meth(self, arg): mod.inst.meth(arg, True, ob=self, kw='cut') Mocking mod.class.meth, calling meth, and checking that the mock is called will satisfy the robot, but does not contribute much to the goal of providing a language that people can use to solve problems. Victor, can you explain 'tested indirectly' and perhaps give an example? As used here,'whereas' is incorrect English and a bit confusing. I believe Victor meant something more like 'even when'. For the last clause, I believe he meant "the code change is not complicated enough to *require* automated unit test coverage for the changed line(s)". If I change a comment or add missing spaces, I don't think I should be forced to write a missing test to make the improvement. A less trivial example: on IDLE's menu, Options => Configure IDLE opens a dialog with a font size widget that when clicked opens a list of font sizes. I recently added 4 larger sizes to the tuple in idlelib.configdialog.ConfigDialog.LoadFontCfg, as requested, I think, by at least 2 people. I tested manually by clicking until the list was displayed. As I remember, I did not immediately worry about automated testing, let alone line coverage, and I do not think I should have had to to get the change into 3.6.0. That line may or may not by covered by the current minimal test that simply creates a ConfigDialog instance. But this gets back to what I think is Viktor's point. This minimal test 'covers' 46% of the file, but it only tests that 46% of the lines run without raising. This is useful, but does not test that the lines are really correct. (For GUI display code, human eyeballing is required.) This would remain true even if all the other code were moved to a new module, making the coverage of configdialog the magical ***100%***.
If it's not important enough to require tests >> it's not important enough to be in Python. ;)
Modules in the test package are mostly not tested. ;) If 'test' means 'line coverage test for new or changed lines', then as a practical matter, I disagree, as explained above. So, in effect, did the people who committed untested lines. In the wider sense of 'test', there is no real argument. Each statement written should be mentally tested both when written and when reviewed. Code should be manually tested, preferably by someone in addition to the author. Automated testing is more than nice, but not everything. Ditto for unit testing. Some practical issues with coverage and CodeCov: 1. A Python module is comprised of statements but coverage module counts physical lines. This is good for development, but not for gating. The number of physical lines comprising a statement can change without changing or with only trivially changing the compiled run code. So if coverage is not 100%, it can vary without a real change in statement coverage. 2. Some statements are only intended to run on certain systems, making 100% coverage impossible unless one carefully puts all system-specific code in "if system == 'xyz'" statements and uses system-specific .coveragerc files to exclude code for 'other' systems. 3. Some tests required extended resources. Statements that are only covered by such tests will be seen as uncovered when coverage is run on a system lacking the resources. As far as I know, all non-Windows buildbots and CodeCov are run on systems lacking the 'gui' resource. So patches to gui code will be seen as uncovered. 4. As I explained in a post on the core-workflow list, IDLE needs the following added to the 'exclude_lines' item of .coveragerc. .*# htest # if not _utest: The mechanism behind these would also be useful for testing any other modules, scripts, or demos that use tkinter GUIs. There seems to be other issues too.
"Untested code is broken code" :)
Most of CPython, including IDLE, has been pretty thoroughly tested. And we have heaps of bug reports to show for it. What's more important is that even code that is tested, by whatever means, may still bugs. Hence However, obscure bugs are still found. And even correct code can be corrupted (repressed) by attempt fix and improve. -- Terry Jan Reedy
While I use code coverage to improve automated unittesting, I am opposed to turning a usable but limited and sometime faulty tool into a blind robotic master that blocks improvements. The prospect of this being done has discouraged me from learning the new system. (More on 'faulty tool' later.)
The temptation to write artificial tests to satisfy an artificial goal is real. Doing so can eat valuable time better used for something else. For instance:
def meth(self, arg): mod.inst.meth(arg, True, ob=self, kw='cut')
Mocking mod.class.meth, calling meth, and checking that the mock is called will satisfy the robot, but does not contribute much to the goal of providing a language that people can use to solve problems.
Victor, can you explain 'tested indirectly' and perhaps give an example?
This is one examples I merged "untested line of code". https://github.com/python/cpython/pull/162/files#diff-0ad86c44e7866421ecaa5a...
file_object = builtins.open(f, 'wb')
try:
self.initfp(file_object)
except:
file_object.close()
raise
self.initfp()
is very unlikely raise exceptions. But MemoryError,
KeyboardInterrupt or
other rare exceptions may be happen.
Test didn't cover this except clause. But I merged it because:
- this code is simple enough.
- I can write test for it with mocking
self.initfp()
to raise exception. But such test code have significant maintenance cost. I don't think this except clause is important enough to maintain such code.
If I remove the except clause, all lines are tested, but there is (very unlikely) leaking unclosed file. Which are "broken"?
Coverage is very nice information to find code which should be tested, but not tested. But I don't think "all code should be tested". It may make hard to improve Python.
So I agree with Victor and Terry.
Regards,
On 04/25/2017 10:15 PM, INADA Naoki wrote:
This is one examples I merged "untested line of code". https://github.com/python/cpython/pull/162/files#diff-0ad86c44e7866421ecaa5a...
file_object = builtins.open(f, 'wb')
try:
self.initfp(file_object)
except:
file_object.close()
raise
self.initfp()
is very unlikely raise exceptions. But MemoryError, KeyboardInterrupt or other rare exceptions may be happen. Test didn't cover this except clause. But I merged it because:
- this code is simple enough.
- I can write test for it with mocking
self.initfp()
to raise exception. But such test code have significant maintenance cost. I don't think this except clause is important enough to maintain such code.
I'm curious as to the maintenance cost -- surely it's write once, forget until you have to change that method again? How often are we changing that method?
If I remove the except clause, all lines are tested, but there is (very unlikely) leaking unclosed file. Which are "broken"?
Definitely the (unlikely) leaking of a file, possibly the untested closing of an otherwise leaked file.
Coverage is very nice information to find code which should be tested, but not tested. But I don't think "all code should be tested".
All code should be tested. Doesn't mean we can, but that should be the goal.
It may make hard to improve Python.
Adding missing tests _is_ improving Python.
-- ~Ethan~
On Tue, 25 Apr 2017 at 17:00 Terry Reedy <tjreedy@udel.edu> wrote:
On 4/25/2017 11:00 AM, Barry Warsaw wrote:
On Apr 24, 2017, at 09:32 PM, Ethan Furman wrote:
On 04/21/2017 03:29 PM, Victor Stinner wrote:
(In the context of having a patch blocked by the blind Codecov robot ...)
I dislike code coverage because there is a temptation to write artificial tests whereas the code is tested indirectly or the code is not important enough to *require* tests.
While I use code coverage to improve automated unittesting, I am opposed to turning a usable but limited and sometime faulty tool into a blind robotic master that blocks improvements. The prospect of this being done has discouraged me from learning the new system. (More on 'faulty tool' later.)
It should be stated that code coverage is not a blocking status check for merging from our perspective (the **only** required check is that Travis pass with it's test run). The only reason Victor ran into it at a blocker is because he tried to do a merge over his phone where GitHub apparently prevents merging if any failing check exists, required or not (I assume this is because reviewing code on a small screen raises the chances of overlooking a failing check that one actually cared about). I'm on vacation until tomorrow so I've been off of GitHub since last Thursday, but it's possible there's already a PR out there to turn off the status checks. If there is still no such PR I'm still fine with turning off the status checks for Codecov.
The temptation to write artificial tests to satisfy an artificial goal is real. Doing so can eat valuable time better used for something else. For instance:
def meth(self, arg): mod.inst.meth(arg, True, ob=self, kw='cut')
Mocking mod.class.meth, calling meth, and checking that the mock is called will satisfy the robot, but does not contribute much to the goal of providing a language that people can use to solve problems.
My assumption is that there will be a test that meth() does the right thing, mock or not. If we provide an API there should be some test for it; using a mock or something else to do the test is an implementation detail.
Victor, can you explain 'tested indirectly' and perhaps give an example?
As used here,'whereas' is incorrect English and a bit confusing. I believe Victor meant something more like 'even when'.
For the last clause, I believe he meant "the code change is not complicated enough to *require* automated unit test coverage for the changed line(s)". If I change a comment or add missing spaces, I don't think I should be forced to write a missing test to make the improvement.
Agreed.
A less trivial example: on IDLE's menu, Options => Configure IDLE opens a dialog with a font size widget that when clicked opens a list of font sizes. I recently added 4 larger sizes to the tuple in idlelib.configdialog.ConfigDialog.LoadFontCfg, as requested, I think, by at least 2 people. I tested manually by clicking until the list was displayed. As I remember, I did not immediately worry about automated testing, let alone line coverage, and I do not think I should have had to to get the change into 3.6.0.
That line may or may not by covered by the current minimal test that simply creates a ConfigDialog instance. But this gets back to what I think is Viktor's point. This minimal test 'covers' 46% of the file, but it only tests that 46% of the lines run without raising. This is useful, but does not test that the lines are really correct. (For GUI display code, human eyeballing is required.) This would remain true even if all the other code were moved to a new module, making the coverage of configdialog the magical ***100%***.
If it's not important enough to require tests >> it's not important enough to be in Python. ;)
Modules in the test package are mostly not tested. ;)
:) But they are at least executed which is what we're really measuring here and I think all Ethan and I are advocating for. E.g. I don't expect test_importlib to be directly responsible for exercising all code in importlib, just that Python's entire test suite exercise importlib as much as possible as a whole.
If 'test' means 'line coverage test for new or changed lines', then as a practical matter, I disagree, as explained above. So, in effect, did the people who committed untested lines.
In the wider sense of 'test', there is no real argument. Each statement written should be mentally tested both when written and when reviewed. Code should be manually tested, preferably by someone in addition to the author. Automated testing is more than nice, but not everything. Ditto for unit testing.
The problem I have with just doing manual testing for things that can easily be covered by a unit test -- directly or indirectly -- is there are technically 85 people who can change CPython today. That means there's potentially 85 different people who can screw up the code ;) . Making sure code is exercised somehow by tests at least minimizes how badly someone like me might mess something thanks to me not realizing I broke the code.
Some practical issues with coverage and CodeCov:
1. A Python module is comprised of statements but coverage module counts physical lines. This is good for development, but not for gating. The number of physical lines comprising a statement can change without changing or with only trivially changing the compiled run code. So if coverage is not 100%, it can vary without a real change in statement coverage.
As I said earlier, no one here is gating PRs on code coverage.
2. Some statements are only intended to run on certain systems, making 100% coverage impossible unless one carefully puts all system-specific code in "if system == 'xyz'" statements and uses system-specific .coveragerc files to exclude code for 'other' systems.
True. We could have a discussion as to whether we want to use Coverage.py's pragmas -- e.g. `# pragma: no cover` -- to flag code as known to be untested. Depending on how far we wanted to take this we could also set up pragmas that flag when a test is OS-specific (I found https://bitbucket.org/ned/coveragepy/issues/563/platform-specific-configurat... while looking for a solution and I'm sure we could discuss things with Ned Batchelder if we needed some functionality in coverage.py for our needs).
3. Some tests required extended resources. Statements that are only covered by such tests will be seen as uncovered when coverage is run on a system lacking the resources. As far as I know, all non-Windows buildbots and CodeCov are run on systems lacking the 'gui' resource. So patches to gui code will be seen as uncovered.
I view 100% coverage as aspirational, not attainable. But if we want an attainable goal, what should we aim for? We're at 83.44% now so we could say that 80% is something we never want to drop below and be done with it. We could up it to 85% or 90% in recognizing that there is more testing to do but that we will never cover all Python code (all of this is configurable in Codecov, hence why I'm asking).
4. As I explained in a post on the core-workflow list, IDLE needs the following added to the 'exclude_lines' item of .coveragerc. .*# htest # if not _utest: The mechanism behind these would also be useful for testing any other modules, scripts, or demos that use tkinter GUIs.
There seems to be other issues too.
"Untested code is broken code" :)
Most of CPython, including IDLE, has been pretty thoroughly tested. And we have heaps of bug reports to show for it. What's more important is that even code that is tested, by whatever means, may still bugs. Hence However, obscure bugs are still found. And even correct code can be corrupted (repressed) by attempt fix and improve.
-- Terry Jan Reedy _______________________________________________ python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
On 4/26/2017 1:45 PM, Brett Cannon wrote:
On Tue, 25 Apr 2017 at 17:00 Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
While I use code coverage to improve automated unittesting, I am opposed to turning a usable but limited and sometime faulty tool into a blind robotic master that blocks improvements. The prospect of this being done has discouraged me from learning the new system. (More on 'faulty tool' later.)
It should be stated that code coverage is not a blocking status check for merging from our perspective (the **only** required check is that Travis pass with it's test run).
I have the impression that at one time you hoped to make it blocking. If that was wrong, I apologize for misunderstanding. If you have changed your mind, then I am happy.
I am otherwise in favor of both the measurement and report of coverage being improved.
The temptation to write artificial tests to satisfy an artificial goal is real. Doing so can eat valuable time better used for something else. For instance: def meth(self, arg): mod.inst.meth(arg, True, ob=self, kw='cut') Mocking mod.class.meth, calling meth, and checking that the mock is called will satisfy the robot, but does not contribute much to the goal of providing a language that people can use to solve problems.
My assumption is that there will be a test that meth() does the right thing, mock or not. If we provide an API there should be some test for it; using a mock or something else to do the test is an implementation detail.
My impression is that default mocks have a generic signature, so that merely checking that the mock is called will not catch an invalid call. I presume that one can do better with mocks, and I have with custom mocks I have written, but my point above was that coverage does not care.
>> If it's not important enough to require tests >> it's not important enough to be in Python. ;) Modules in the test package are mostly not tested. ;)
:) But they are at least executed which is what we're really measuring here and I think all Ethan and I are advocating for.
I thought Ethan was advocating for more -- a specific unittest for each line.
E.g. I don't expect test_importlib to be directly responsible for exercising all code in importlib, just that Python's entire test suite exercise importlib as much as possible as a whole.
The advantage for importlib in this respect is that import statements cannot be mocked; only the objects imported, after importlib is finished.
There is lots of interaction between idlelib modules, but I would still like direct tests of each idlelib.xyz with a test_xyz.py. Three years ago, there was no test.test_idle. There now is, and it runs 35 idlelib.idle_test.text_* files. (There are 60 idlelib modules.)
The problem I have with just doing manual testing for things that can easily be covered by a unit test -- directly or indirectly -- is there are technically 85 people who can change CPython today. That means there's potentially 85 different people who can screw up the code ;) .
At the moment, I am the only one pushing idlelib patches, except when it gets included in one of Serhiy's multi-module refactoring patches (and he always nosies me). That skews my view a bit. However, with most of the critical issues fixed, I am anxious to automate what I can of what I now do by hand.
Making sure code is exercised somehow by tests at least minimizes how badly someone like me might mess something thanks to me not realizing I broke the code.
I had not thought about the issue that way. I should add a test_module for each remaining module, import the module, and at least create an instance of every tkinter widget defined therein, and see what other classes could be easily instantiated and what functions easily run.
Some practical issues with coverage and CodeCov:
2. Some statements are only intended to run on certain systems, making 100% coverage impossible unless one carefully puts all system-specific code in "if system == 'xyz'" statements and uses system-specific .coveragerc files to exclude code for 'other' systems.
True. We could have a discussion as to whether we want to use Coverage.py's pragmas ... I'm sure we could discuss things with Ned Batchelder if we needed some functionality in coverage.py for our needs).
Let's skip this for now.
3. Some tests required extended resources. Statements that are only covered by such tests will be seen as uncovered when coverage is run on a system lacking the resources. As far as I know, all non-Windows buildbots and CodeCov are run on systems lacking the 'gui' resource. So patches to gui code will be seen as uncovered.
I view 100% coverage as aspirational, not attainable. But if we want an attainable goal, what should we aim for? We're at 83.44% now
On what system? I suspect that Tkinter, ttk, turtle, and IDLE GUI-dependent tests make at least a 2% difference on GUI Windows versus no-GUI *nix.
we could
say that 80% is something we never want to drop below and be done with it. We could up it to 85% or 90% in recognizing that there is more testing to do but that we will never cover all Python code (all of this is configurable in Codecov, hence why I'm asking).
Since I think we actually are at 85%, and certainly will be when I add minimal easy tests for the rest of IDLELIB, I think 90% would be a reasonable goal.
One way to increase coverage is to push a bit harder on fulfilling the 'test needed' stage. Theoretically, every substantive (behavior-changing) patch should start with a test that fails. Since PRs are separate from the main repository and can be patched separately, a PR could start with a test that should immediately fail but should pass before merging. It would be nice if the test runner knew to only run the new test and not the entire suite. It would be even nicer if it know that initial failure is success. Is there at least a 'New Test' label on PRs?
4. As I explained in a post on the core-workflow list, IDLE needs the following added to the 'exclude_lines' item of .coveragerc. .*# htest # if not _utest:
These additions would remove, I think, at least 400 lines from the uncovered category. Both only occur in idlelib.
-- Terry Jan Reedy
On 04/26/2017 10:35 PM, Terry Reedy wrote:
On 4/26/2017 1:45 PM, Brett Cannon wrote:
:) But they are at least executed which is what we're really measuring here and I think all Ethan and I are advocating for.
I thought Ethan was advocating for more -- a specific unittest for each line.
I'm hoping that's just poor wording, or each module would need hundreds of tests. ;)
What I am advocating is to have 100% test coverage -- but as Brett said, that's an aspirational goal. Said a different way: all code in Python is important enough to test; whether we can write those tests with our limited resources is a different matter.
I am well aware that writing good tests is hard. I am painfully aware of the frustration of having error-reporting code error out because it wasn't tested.
I agree that we shouldn't deny core developers from merging because of coverage statistics.
-- ~Ethan~
On 4/27/2017 1:52 AM, Ethan Furman wrote:
On 04/26/2017 10:35 PM, Terry Reedy wrote:
On 4/26/2017 1:45 PM, Brett Cannon wrote:
:) But they are at least executed which is what we're really measuring here and I think all Ethan and I are advocating for.
I thought Ethan was advocating for more -- a specific unittest for each line.
I'm hoping that's just poor wording, or each module would need hundreds of tests. ;)
Poor wording, but I was wrong in any case. An aspirational goal is that each line of of each function be exercised for correctness by at least one unittest. For top-level code not in a function or class, we mostly have to settle for compile and run without error.
On Wed, 26 Apr 2017 at 22:36 Terry Reedy <tjreedy@udel.edu> wrote:
On 4/26/2017 1:45 PM, Brett Cannon wrote:
On Tue, 25 Apr 2017 at 17:00 Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
While I use code coverage to improve automated unittesting, I am
opposed
to turning a usable but limited and sometime faulty tool into a blind robotic master that blocks improvements. The prospect of this being done has discouraged me from learning the new system. (More on
'faulty
tool' later.)
It should be stated that code coverage is not a blocking status check for merging from our perspective (the **only** required check is that Travis pass with it's test run).
I have the impression that at one time you hoped to make it blocking. If that was wrong, I apologize for misunderstanding. If you have changed your mind, then I am happy.
"Hope", sure; "currently plan to", no. IOW I have said a lot of things over the 2.5 years I've been leading this workflow shift and I don't expect all of them to pan out.
I am otherwise in favor of both the measurement and report of coverage being improved.
The temptation to write artificial tests to satisfy an artificial
goal
is real. Doing so can eat valuable time better used for something
else.
For instance: def meth(self, arg): mod.inst.meth(arg, True, ob=self, kw='cut') Mocking mod.class.meth, calling meth, and checking that the mock is called will satisfy the robot, but does not contribute much to the
goal
of providing a language that people can use to solve problems.
My assumption is that there will be a test that meth() does the right thing, mock or not. If we provide an API there should be some test for it; using a mock or something else to do the test is an implementation detail.
My impression is that default mocks have a generic signature, so that merely checking that the mock is called will not catch an invalid call. I presume that one can do better with mocks, and I have with custom mocks I have written, but my point above was that coverage does not care.
>> If it's not important enough to require tests >> it's not important enough to be in Python. ;) Modules in the test package are mostly not tested. ;)
:) But they are at least executed which is what we're really measuring here and I think all Ethan and I are advocating for.
I thought Ethan was advocating for more -- a specific unittest for each line.
E.g. I don't expect test_importlib to be directly responsible for exercising all code in importlib, just that Python's entire test suite exercise importlib as much as possible as a whole.
The advantage for importlib in this respect is that import statements cannot be mocked; only the objects imported, after importlib is finished.
Oh, you can mock import statements. :)
There is lots of interaction between idlelib modules, but I would still like direct tests of each idlelib.xyz with a test_xyz.py. Three years ago, there was no test.test_idle. There now is, and it runs 35 idlelib.idle_test.text_* files. (There are 60 idlelib modules.)
The problem I have with just doing manual testing for things that can easily be covered by a unit test -- directly or indirectly -- is there are technically 85 people who can change CPython today. That means there's potentially 85 different people who can screw up the code ;) .
At the moment, I am the only one pushing idlelib patches, except when it gets included in one of Serhiy's multi-module refactoring patches (and he always nosies me). That skews my view a bit. However, with most of the critical issues fixed, I am anxious to automate what I can of what I now do by hand.
Making sure code is exercised somehow by tests at least minimizes how badly someone like me might mess something thanks to me not realizing I broke the code.
I had not thought about the issue that way. I should add a test_module for each remaining module, import the module, and at least create an instance of every tkinter widget defined therein, and see what other classes could be easily instantiated and what functions easily run.
That seems like a good starting point. Kind of like test_sundry but with class instantiation on top of it.
Some practical issues with coverage and CodeCov:
2. Some statements are only intended to run on certain systems,
making
100% coverage impossible unless one carefully puts all
system-specific
code in "if system == 'xyz'" statements and uses system-specific .coveragerc files to exclude code for 'other' systems.
True. We could have a discussion as to whether we want to use Coverage.py's pragmas ... I'm sure we could discuss things with Ned Batchelder if we needed some functionality in coverage.py for our needs).
Let's skip this for now.
3. Some tests required extended resources. Statements that are only covered by such tests will be seen as uncovered when coverage is run
on
a system lacking the resources. As far as I know, all non-Windows buildbots and CodeCov are run on systems lacking the 'gui'
resource. So
patches to gui code will be seen as uncovered.
I view 100% coverage as aspirational, not attainable. But if we want an attainable goal, what should we aim for? We're at 83.44% now
On what system?
Travis, where the Codecov run is driven from.
I suspect that Tkinter, ttk, turtle, and IDLE GUI-dependent tests make at least a 2% difference on GUI Windows versus no-GUI *nix.
we could
say that 80% is something we never want to drop below and be done with it. We could up it to 85% or 90% in recognizing that there is more testing to do but that we will never cover all Python code (all of this is configurable in Codecov, hence why I'm asking).
Since I think we actually are at 85%, and certainly will be when I add minimal easy tests for the rest of IDLELIB, I think 90% would be a reasonable goal.
Seems reasonable to me. Opened https://github.com/python/core-workflow/issues/75 to remind me to tweak Codecov's config so that we are aiming for 90% overall.
-Brett
One way to increase coverage is to push a bit harder on fulfilling the 'test needed' stage. Theoretically, every substantive (behavior-changing) patch should start with a test that fails. Since PRs are separate from the main repository and can be patched separately, a PR could start with a test that should immediately fail but should pass before merging. It would be nice if the test runner knew to only run the new test and not the entire suite. It would be even nicer if it know that initial failure is success. Is there at least a 'New Test' label on PRs?
4. As I explained in a post on the core-workflow list, IDLE needs the following added to the 'exclude_lines' item of .coveragerc. .*# htest # if not _utest:
These additions would remove, I think, at least 400 lines from the uncovered category. Both only occur in idlelib.
-- Terry Jan Reedy
On 4/27/2017 3:44 PM, Brett Cannon wrote:
On Wed, 26 Apr 2017 at 22:36 Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
On 4/26/2017 1:45 PM, Brett Cannon wrote:
> E.g. I don't expect > test_importlib to be directly responsible for exercising all code in > importlib, just that Python's entire test suite exercise importlib as > much as possible as a whole. The advantage for importlib in this respect is that import statements cannot be mocked; only the objects imported, after importlib is finished.
Oh, you can mock import statements. :)
Other than by pre-loading a mock module into sys.modules? If so, please give a hint, as this could be useful to me.
At the moment, I am the only one pushing idlelib patches, except when it gets included in one of Serhiy's multi-module refactoring patches (and he always nosies me).
It turns out that Louie Lu's new tool revealed a couple of other patches, though just to tests that started failing.
I had not thought about the issue that way. I should add a test_module for each remaining module, import the module, and at least create an instance of every tkinter widget defined therein, and see what other classes could be easily instantiated and what functions easily run.
That seems like a good starting point. Kind of like test_sundry but with class instantiation on top of it.
I looked and saw that bdb is in 'untested'. I also discovered https://bugs.python.org/issue19417 to change that, with a 3+ year-old-patch. I plan to review it.
> I view 100% coverage as aspirational, not attainable. But if we want an > attainable goal, what should we aim for? We're at 83.44% now On what system?
Travis, where the Codecov run is driven from.
I meant OS, because
I suspect that Tkinter, ttk, turtle, and IDLE GUI-dependent tests make at least a 2% difference on GUI Windows versus no-GUI *nix.
-- Terry Jan Reedy
On 28/04/17 01:49, Terry Reedy wrote:
On 4/27/2017 3:44 PM, Brett Cannon wrote:
On Wed, 26 Apr 2017 at 22:36 Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
On 4/26/2017 1:45 PM, Brett Cannon wrote:
> E.g. I don't expect > test_importlib to be directly responsible for exercising all
code in > importlib, just that Python's entire test suite exercise importlib as > much as possible as a whole.
The advantage for importlib in this respect is that import
statements cannot be mocked; only the objects imported, after importlib is finished.
Oh, you can mock import statements. :)
Other than by pre-loading a mock module into sys.modules? If so, please give a hint, as this could be useful to me.
https://docs.python.org/3/library/unittest.mock-examples.html#mocking-import...
At the moment, I am the only one pushing idlelib patches, except
when it gets included in one of Serhiy's multi-module refactoring patches (and he always nosies me).
It turns out that Louie Lu's new tool revealed a couple of other patches, though just to tests that started failing.
I had not thought about the issue that way. I should add a
test_module for each remaining module, import the module, and at least create an instance of every tkinter widget defined therein, and see what other classes could be easily instantiated and what functions easily run.
That seems like a good starting point. Kind of like test_sundry but with class instantiation on top of it.
I looked and saw that bdb is in 'untested'. I also discovered https://bugs.python.org/issue19417 to change that, with a 3+ year-old-patch. I plan to review it.
> I view 100% coverage as aspirational, not attainable. But if we want an > attainable goal, what should we aim for? We're at 83.44% now On what system?
Travis, where the Codecov run is driven from.
I meant OS, because
I suspect that Tkinter, ttk, turtle, and IDLE GUI-dependent tests make at least a 2% difference on GUI Windows
versus no-GUI *nix.
-- Terry Jan Reedy
python-committers mailing list python-committers@python.org https://mail.python.org/mailman/listinfo/python-committers Code of Conduct: https://www.python.org/psf/codeofconduct/
On Fri, 28 Apr 2017 at 02:19 Michael Foord <michael@voidspace.org.uk> wrote:
On 28/04/17 01:49, Terry Reedy wrote:
On 4/27/2017 3:44 PM, Brett Cannon wrote:
On Wed, 26 Apr 2017 at 22:36 Terry Reedy <tjreedy@udel.edu <mailto:tjreedy@udel.edu>> wrote:
On 4/26/2017 1:45 PM, Brett Cannon wrote:
> E.g. I don't expect > test_importlib to be directly responsible for exercising all
code in > importlib, just that Python's entire test suite exercise importlib as > much as possible as a whole.
The advantage for importlib in this respect is that import
statements cannot be mocked; only the objects imported, after importlib is finished.
Oh, you can mock import statements. :)
Other than by pre-loading a mock module into sys.modules? If so, please give a hint, as this could be useful to me.
https://docs.python.org/3/library/unittest.mock-examples.html#mocking-import...
The other option is to stub out __import__() itself.
participants (7)
-
Barry Warsaw
-
Brett Cannon
-
Ethan Furman
-
INADA Naoki
-
Michael Foord
-
Terry Reedy
-
Victor Stinner