On Tue, 25 Apr 2017 at 17:00 Terry Reedy <tjreedy@udel.edu> wrote:

On 4/25/2017 11:00 AM, Barry Warsaw wrote:
> On Apr 24, 2017, at 09:32 PM, Ethan Furman wrote:
>>On 04/21/2017 03:29 PM, Victor Stinner wrote:

(In the context of having a patch blocked by the blind Codecov robot ...)

>>> I dislike code coverage because there is a temptation to write artificial
>>> tests whereas the code is tested indirectly or the code is not important
>>> enough to *require* tests.

While I use code coverage to improve automated unittesting, I am opposed
to turning a usable but limited and sometime faulty tool into a blind
robotic master that blocks improvements. The prospect of this being
done has discouraged me from learning the new system. (More on 'faulty
tool' later.)

It should be stated that code coverage is not a blocking status check for merging from our perspective (the **only** required check is that Travis pass with it's test run). The only reason Victor ran into it at a blocker is because he tried to do a merge over his phone where GitHub apparently prevents merging if any failing check exists, required or not (I assume this is because reviewing code on a small screen raises the chances of overlooking a failing check that one actually cared about).

I'm on vacation until tomorrow so I've been off of GitHub since last Thursday, but it's possible there's already a PR out there to turn off the status checks. If there is still no such PR I'm still fine with turning off the status checks for Codecov.

The temptation to write artificial tests to satisfy an artificial goal
is real. Doing so can eat valuable time better used for something else.
For instance:

def meth(self, arg):
mod.inst.meth(arg, True, ob=self, kw='cut')

Mocking mod.class.meth, calling meth, and checking that the mock is
called will satisfy the robot, but does not contribute much to the goal
of providing a language that people can use to solve problems.

My assumption is that there will be a test that meth() does the right thing, mock or not. If we provide an API there should be some test for it; using a mock or something else to do the test is an implementation detail.

Victor, can you explain 'tested indirectly' and perhaps give an example?

As used here,'whereas' is incorrect English and a bit confusing. I
believe Victor meant something more like 'even when'.

For the last clause, I believe he meant "the code change is not
complicated enough to *require* automated unit test coverage for the
changed line(s)". If I change a comment or add missing spaces, I don't
think I should be forced to write a missing test to make the improvement.

Agreed.

A less trivial example: on IDLE's menu, Options => Configure IDLE opens
a dialog with a font size widget that when clicked opens a list of font
sizes. I recently added 4 larger sizes to the tuple in
idlelib.configdialog.ConfigDialog.LoadFontCfg, as requested, I think, by
at least 2 people. I tested manually by clicking until the list was
displayed. As I remember, I did not immediately worry about automated
testing, let alone line coverage, and I do not think I should have had
to to get the change into 3.6.0.

That line may or may not by covered by the current minimal test that
simply creates a ConfigDialog instance. But this gets back to what I
think is Viktor's point. This minimal test 'covers' 46% of the file,
but it only tests that 46% of the lines run without raising. This is
useful, but does not test that the lines are really correct. (For GUI
display code, human eyeballing is required.) This would remain true
even if all the other code were moved to a new module, making the
coverage of configdialog the magical ***100%***.

>> If it's not important enough to require tests >> it's not important enough to be in Python. ;)

Modules in the test package are mostly not tested. ;)

:) But they are at least executed which is what we're really measuring here and I think all Ethan and I are advocating for. E.g. I don't expect test_importlib to be directly responsible for exercising all code in importlib, just that Python's entire test suite exercise importlib as much as possible as a whole.

If 'test' means 'line coverage test for new or changed lines', then as a
practical matter, I disagree, as explained above. So, in effect, did
the people who committed untested lines.

In the wider sense of 'test', there is no real argument. Each statement
written should be mentally tested both when written and when reviewed.
Code should be manually tested, preferably by someone in addition to the
author. Automated testing is more than nice, but not everything. Ditto
for unit testing.

The problem I have with just doing manual testing for things that can easily be covered by a unit test -- directly or indirectly -- is there are technically 85 people who can change CPython today. That means there's potentially 85 different people who can screw up the code ;) . Making sure code is exercised somehow by tests at least minimizes how badly someone like me might mess something thanks to me not realizing I broke the code.

Some practical issues with coverage and CodeCov:

1. A Python module is comprised of statements but coverage module counts
physical lines. This is good for development, but not for gating. The
number of physical lines comprising a statement can change without
changing or with only trivially changing the compiled run code. So if
coverage is not 100%, it can vary without a real change in statement
coverage.

As I said earlier, no one here is gating PRs on code coverage.

2. Some statements are only intended to run on certain systems, making
100% coverage impossible unless one carefully puts all system-specific
code in "if system == 'xyz'" statements and uses system-specific
.coveragerc files to exclude code for 'other' systems.

True. We could have a discussion as to whether we want to use Coverage.py's pragmas -- e.g. `# pragma: no cover` -- to flag code as known to be untested. Depending on how far we wanted to take this we could also set up pragmas that flag when a test is OS-specific (I found https://bitbucket.org/ned/coveragepy/issues/563/platform-specific-configuration while looking for a solution and I'm sure we could discuss things with Ned Batchelder if we needed some functionality in coverage.py for our needs).

3. Some tests required extended resources. Statements that are only
covered by such tests will be seen as uncovered when coverage is run on
a system lacking the resources. As far as I know, all non-Windows
buildbots and CodeCov are run on systems lacking the 'gui' resource. So
patches to gui code will be seen as uncovered.

I view 100% coverage as aspirational, not attainable. But if we want an attainable goal, what should we aim for? We're at 83.44% now so we could say that 80% is something we never want to drop below and be done with it. We could up it to 85% or 90% in recognizing that there is more testing to do but that we will never cover all Python code (all of this is configurable in Codecov, hence why I'm asking).

4. As I explained in a post on the core-workflow list, IDLE needs the
following added to the 'exclude_lines' item of .coveragerc.
.*# htest #
if not _utest:
The mechanism behind these would also be useful for testing any other
modules, scripts, or demos that use tkinter GUIs.

There seems to be other issues too.

> "Untested code is broken code" :)

Most of CPython, including IDLE, has been pretty thoroughly tested. And
we have heaps of bug reports to show for it. What's more important is
that even code that is tested, by whatever means, may still bugs. Hence
However, obscure bugs are still found. And even correct code can be
corrupted (repressed) by attempt fix and improve.

--
Terry Jan Reedy
_______________________________________________
python-committers mailing list
python-committers@python.org
https://mail.python.org/mailman/listinfo/python-committers
Code of Conduct: https://www.python.org/psf/codeofconduct/