[Python-ideas] PEP 8 update on line length

Thu Feb 21 23:33:20 EST 2019

On Thu, Feb 21, 2019 at 05:06:51PM -0800, Chris Barker via Python-ideas wrote:

> To all the folks quoting theory: let's be honest. Yes,  really long lines
> are harder to read, but the 80 char limit comes from old terminals, NOT any
> analysis that somehow that is optimum for readability.

Chris, the convention to limit text to somewhere around 60-80 characters 
predates old terminals by *literally centuries*. I don't think it's *us* 
that needs to be told to "be honest".

I don't know who first came up with this story that the 79 column limit 
is all about compatibility with old 80 char terminals, but it's just a 
story. (And did they ever stop to wonder why those old terminals 
standardized on 80 columns?)

Compatibility with old terminals is "nice to have" if you ever need to 
ssh into a remote machine via an 80-column machine and edit code (and I 
know somebody who actually does that!), but that's not the reason why we 
should keep the 80 column limit as the default.

(Many people have already spent a lot of words explaining some of the 
advantages of an 80 char limit, and I don't intend to go over them 
again. Go back and read the thread.)

I've just grabbed a handful of books at random from my bookcase, and 
done a quick sample of number of chars per line:

42 letters plus whitespace = 52 characters

28 letters plus whitespace = 34 x 2 columns = 68

63 plus ws = 75

56 plus ws = 67

73 plus ws = 84

59 plus ws = 70

56 plus ws = 67 (another one!)

I would be surprised if you found many books that reached 95-100 
characters, and shocked if you found any at all that reached 120 
characters.

Based on this sample, I would say the typical line length for optimal 
reading of prose is about 60-70 chars. Call it 65. Add four leading 
idents of four spaces each, and our optimum is about 81 columns.

The difference between that and PEP 8's 79 columns is not significant. 
(I for one would not fail your code in a review merely for reaching 81 
or even 82 columns.)

Now, it does have to be admitted that prose does not have the same 
characteristics as source code. Prose tends to have solid paragraphs of 
the same width, and we typically read large blocks of text in full. 
Whereas source code tends to have lots of short lines, and a few very 
long lines. We typically skim most of the text, then focus in tightly to 
study in detail a small section of the text at a time.

And any limit we choose is going to be a compromise between the need to 
avoid giant one-liners and the nuisance value of splitting a conceptual 
line of code over multiple physical lines. Being a compromise, there 
will always be cases where it is sub-optimal.

Nevertheless, we can say this about typical Python source code:

1. 79 characters is *very generous* for most lines of code; I did a 
quick sample of code and found an average of 51 columns including the 
leading indents. This is, of course, an imperfect and biased sample 
because long lines have been split to keep the 79 char limit, but even a 
brief glance at the std lib shows that most lines of code tend to fit 
within 50-60 characters.

2. When a single line goes beyond 80 columns, it often wants to go a 
long way beyond. Perl-ish one-liners are merely a extreme case of this.

3. Such long lines are often complex, which makes them hard to read and 
hard to debug.

Opinion: we really shouldn't be encouraging people to write long complex 
lines of code. If a single line has more than a dozen method calls in 
it, it might be a tad too complex for one physical line regardless of 
how wide your monitor is :-)

Splitting such complex expressions over multiple lines, or even multiple 
statements, can have advantages beyond merely keeping to the 79 column 
limit. It can often result in better code that is easier to understand, 
debug and maintain.

4. But one notable exception to this is the case where you have a long 
format string, often passed to "raise Exception", or print. They're 
rarely complicated or hard to read: at worst, substituting a few 
variables into a format string.

These are often indented four or five levels deep, and they really are a 
pain-point. They're sometimes hard to split over multiple lines. And 
not only are they conceptually simple, but we rarely need to read them 
in detail. Its the surrounding code we need to read closely.

(Raymond's post singles these kinds of lines out as especially 
problematic, and his observations agree with my experience.)

Opinion: common sense should prevail here. If you have a line "raise 
ValueError(...)" which would reach 80 or even 90 characters, don't let 
PEP 8 alone tell you otherwise. It's just a rule, not a law of physics. 
We have rules so that you *think before you break them*.

But if you have a more substantial code that exceeds 80 columns, that's 
a code smell and you ought to think long and hard before breaking it.

Proposal:

- keep PEP 8's current recommendation;

- but remind people that the rule can be relaxed for lines that are 
conceptually simple, such as the "raise Exception(...)" pattern;

- and also remind people that long *complex* lines are an anti-pattern. 
Such complex lines can be improved by splitting them over multiple 
lines, and should be.

I know we try to think in hard limits. "If 79 is too short, then 90 or 
100 or 150 or ..." I'm making a plea for the opposite: if you intend to 
break 80 columns, consider the line itself before breaking it. Don't 
just increase the limit.

That effectively says "any amount of complexity is OK in a single line, 
so long as it remains below X columns". I'd rather people look at the 
line and decide "this is too complex, split it" or "it's just a format 
string (or whatever), let it be".

-- 
Steven