[code-quality] The union of all tools

Tue Dec 9 17:52:08 CET 2014

On Mon, Dec 8, 2014 at 4:10 PM, Skip Montanaro <skip.montanaro at gmail.com>
wrote:

>
> This is not to say that Phil doesn't have excellent reasons for his
> approach, just that there is a non-trivial trade-off in that approach.
>

It's a feature, not a bug. You can run pyflakes against some ugly code that
an intern wrote 3 years ago, and every single thing it tells you is a bug
in the program (or a bug in pyflakes). It won't complain about how the
intern didn't understand that classes should be named in CamelCase, or
gripe that every iteration variable is named "i", even though the code
works. Without any up-front effort or configuration, pyflakes will point
you immediately at the areas of code that have problems instead of sending
you off on a 2-week search-and-replace mission to rename all the variables
and twiddle whitespace which is more likely to introduce bugs than to fix
them.

The premise of pyflakes is that you are smarter than your linting tool, you
write tests to check the correctness of your program, and you audit for
readability in code review. You can also run it against really good code
without any configuration effort, and still everything you get is a real
problem in the code, and pyflakes will find things that a unit test will
not.

In fact, when I wrote pyflakes, pylint and pychecker already existed. I was
working at Divmod, which was staffed by many of the core contributors of
Twisted. You can look at the Twisted code to get an idea of the climate:
unit test coverage is nearly 100%, there is a brutal review process, and
all the developers have years of experience and not a single intern in
sight. The code quality is very high.

However, we had a few problems where sometimes we'd add an import to a
file, and suddenly the module couldn't be imported because of a cycle in
the import dependencies. Frequently, the cycle was due to an import which
served no purpose, maybe left over from a function that had moved
elsewhere. So, the first and easiest step to reducing the cycles would be
to remove all these unnecessary imports.

I tried pychecker and pylint, and they each had problems:

1) they were slow.
2) they spewed *thousands* of "problems".

Remember, this is a code base that is super well tested. There weren't
thousands of problems. I suppose I could have taken the time to figure out
how to configure these tools to not complain about how Divmod's naming
conventions didn't match the tool's defaults, and disable the dozens of
other "style suggestions" and warnings about things that "might" be bad but
which, because of the tests and code review, I knew were not. But given
that the existing tools were already slow, I figured I could do better, and
pyflakes was born.

After I wrote pyflakes and we ran it against the code, we also found some
code paths which were useless. They didn't show up in unit test code
coverage reports because they were executed, but maybe they'd calculate a
variable that never got used. We were then able to remove these bits of
useless code. Between Divmod and Twisted I had a large corpus of
well-tested, rigorously reviewed, high-quality code, so I reasoned that if
pyflakes found a false-positive in this code, it was a bug in pyflakes.

So, I'd argue that the incremental benefit of using pylint is small if you
have tests.

I should also point out that a lot of people seems to agree with this
philosophy, but still want to enforce conformance to some kind of naming
and whitespace convention, like PEP8. There's a program that does exactly
that, called (unsurprisingly) pep8. There's also flake8, which combines
pyflakes and pep8. Based on PyPI download stats, It's a fair guess that the
most popular use of pyflakes is as a dependency of flake8.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/code-quality/attachments/20141209/524de77c/attachment.html>