[Python-ideas] PEP 8 update on line length
Steven D'Aprano
steve at pearwood.info
Thu Feb 21 23:33:20 EST 2019
On Thu, Feb 21, 2019 at 05:06:51PM -0800, Chris Barker via Python-ideas wrote:
> To all the folks quoting theory: let's be honest. Yes, really long lines
> are harder to read, but the 80 char limit comes from old terminals, NOT any
> analysis that somehow that is optimum for readability.
Chris, the convention to limit text to somewhere around 60-80 characters
predates old terminals by *literally centuries*. I don't think it's *us*
that needs to be told to "be honest".
I don't know who first came up with this story that the 79 column limit
is all about compatibility with old 80 char terminals, but it's just a
story. (And did they ever stop to wonder why those old terminals
standardized on 80 columns?)
Compatibility with old terminals is "nice to have" if you ever need to
ssh into a remote machine via an 80-column machine and edit code (and I
know somebody who actually does that!), but that's not the reason why we
should keep the 80 column limit as the default.
(Many people have already spent a lot of words explaining some of the
advantages of an 80 char limit, and I don't intend to go over them
again. Go back and read the thread.)
I've just grabbed a handful of books at random from my bookcase, and
done a quick sample of number of chars per line:
42 letters plus whitespace = 52 characters
28 letters plus whitespace = 34 x 2 columns = 68
63 plus ws = 75
56 plus ws = 67
73 plus ws = 84
59 plus ws = 70
56 plus ws = 67 (another one!)
I would be surprised if you found many books that reached 95-100
characters, and shocked if you found any at all that reached 120
characters.
Based on this sample, I would say the typical line length for optimal
reading of prose is about 60-70 chars. Call it 65. Add four leading
idents of four spaces each, and our optimum is about 81 columns.
The difference between that and PEP 8's 79 columns is not significant.
(I for one would not fail your code in a review merely for reaching 81
or even 82 columns.)
Now, it does have to be admitted that prose does not have the same
characteristics as source code. Prose tends to have solid paragraphs of
the same width, and we typically read large blocks of text in full.
Whereas source code tends to have lots of short lines, and a few very
long lines. We typically skim most of the text, then focus in tightly to
study in detail a small section of the text at a time.
And any limit we choose is going to be a compromise between the need to
avoid giant one-liners and the nuisance value of splitting a conceptual
line of code over multiple physical lines. Being a compromise, there
will always be cases where it is sub-optimal.
Nevertheless, we can say this about typical Python source code:
1. 79 characters is *very generous* for most lines of code; I did a
quick sample of code and found an average of 51 columns including the
leading indents. This is, of course, an imperfect and biased sample
because long lines have been split to keep the 79 char limit, but even a
brief glance at the std lib shows that most lines of code tend to fit
within 50-60 characters.
2. When a single line goes beyond 80 columns, it often wants to go a
long way beyond. Perl-ish one-liners are merely a extreme case of this.
3. Such long lines are often complex, which makes them hard to read and
hard to debug.
Opinion: we really shouldn't be encouraging people to write long complex
lines of code. If a single line has more than a dozen method calls in
it, it might be a tad too complex for one physical line regardless of
how wide your monitor is :-)
Splitting such complex expressions over multiple lines, or even multiple
statements, can have advantages beyond merely keeping to the 79 column
limit. It can often result in better code that is easier to understand,
debug and maintain.
4. But one notable exception to this is the case where you have a long
format string, often passed to "raise Exception", or print. They're
rarely complicated or hard to read: at worst, substituting a few
variables into a format string.
These are often indented four or five levels deep, and they really are a
pain-point. They're sometimes hard to split over multiple lines. And
not only are they conceptually simple, but we rarely need to read them
in detail. Its the surrounding code we need to read closely.
(Raymond's post singles these kinds of lines out as especially
problematic, and his observations agree with my experience.)
Opinion: common sense should prevail here. If you have a line "raise
ValueError(...)" which would reach 80 or even 90 characters, don't let
PEP 8 alone tell you otherwise. It's just a rule, not a law of physics.
We have rules so that you *think before you break them*.
But if you have a more substantial code that exceeds 80 columns, that's
a code smell and you ought to think long and hard before breaking it.
Proposal:
- keep PEP 8's current recommendation;
- but remind people that the rule can be relaxed for lines that are
conceptually simple, such as the "raise Exception(...)" pattern;
- and also remind people that long *complex* lines are an anti-pattern.
Such complex lines can be improved by splitting them over multiple
lines, and should be.
I know we try to think in hard limits. "If 79 is too short, then 90 or
100 or 150 or ..." I'm making a plea for the opposite: if you intend to
break 80 columns, consider the line itself before breaking it. Don't
just increase the limit.
That effectively says "any amount of complexity is OK in a single line,
so long as it remains below X columns". I'd rather people look at the
line and decide "this is too complex, split it" or "it's just a format
string (or whatever), let it be".
--
Steven
More information about the Python-ideas
mailing list