PEP8 and 4 spaces

Steven D'Aprano steve at pearwood.info
Tue Jul 8 10:48:08 CEST 2014


On Tue, 08 Jul 2014 11:22:25 +1000, Ben Finney wrote:

> A group of (a particular amount of) U+0020 characters is visually
> indistinguishable from a U+0009 character, when the default semantics
> are applied to each.

Hmmm. I'm not sure there actually *is* such a thing as "default 
semantics" for tabs. If you look at a tab character in a font, it 
probably looks like a single space, but that depends on the font 
designer. But if you look at it in a text editor, it will probably look 
like eight spaces, unless it looks like four, or some other number, and 
if you look at it in a word processor, it will probably look like a "jump 
to the next tab stop" command. In a spreadsheet application, it will be a 
cell separator and consequently doesn't look like anything at all. I 
don't think any of those things count as "default semantics".

The point being, tabs are *control characters*, like newlines and 
carriage returns and form feeds, not regular characters like spaces and 
"A" or "λ". Since "indent" is an *instruction* rather than a character, 
it is best handled with a control character.

In any case, if we limit ourselves to text editors, only a specific 
number of spaces will be visually indistinguishable from a tab, where the 
number depends on which column you start with:

x	x	# Tab
x       x	# Seven spaces
x      x	# Six spaces
x        x	# Eight spaces


Even in a proportional font, the last two should be distinguishable from 
the first two. Admittedly, that does leave the case where N spaces (for 
some 1 <= N <= 8) looks like a tab. That's a probably, but it's not the 
only one:

* End of line is a problem. I know of *at least* the following seven 
conventions for end-of-line:

    - ASCII line feed, \n (Unix etc.)
    - ASCII carriage return, \r (Acorn, ZX Spectrum, Apple, etc.)
    - ASCII \r\n (CP/M, DOS, Windows, Symbian, Palm, etc.)
    - ASCII \n\r (RISC OS)
    - ASCII Record Separator, \x1E (QNX)
    - EBCDIC New Line, \N{NEXT LINE} in Unicode (IBM mainframes)
    - ATASCII \x9B (Atari)

* Form feeds are a problem, since they are invisible, but still get used 
(by Vim or Emacs, I forget which) to mark sections of text.

* Issues to do with word-wrapping and hyphenation, or lack thereof, are a 
problem.

* Encoding issues are a problem.

* There are other invisible characters than spaces (non-breaking space, 
em-space, en-space, thin space).


The solution is to use a smarter editor. For example, an editor might 
draw a horizontal rule to show a form feed on a line of its own, or 
highlight unexpected carriage return characters with ^M, or display tabs 
in a different colour from spaces, or overlay it with a \x09 glyph. Or an 
editor might be smart enough to automatically do what the current 
paragraph or block does: if the block is already indented with tabs, 
pressing tab inserts a tab, but if it is indented with spaces, pressing 
tab inserts spaces.

Isn't this why you recommend people use a programmer's editor rather than 
Notepad? A good editor should handle these things for you automatically, 
or at least with a minimum amount of manual effort.


>> The former is a "control" character, which has specific semantics
>> associated with it; the latter is a "printable" character, which is
>> usually printed and interpreted as itself (although in this particular
>> case, the printed representation is hard to see on most output
>> devices).
> 
> And those specific semantics make the display of those characters easily
> confused. That is why it's generally a bad idea to use U+0009 in text
> edited by humans.

I disagree. Using tabs is no more a bad idea than using a formfeed, or 
having support for multiple encodings.


>> This mailing list doesn't seem to mind that lines beginning with ASCII
>> SPC characters are semantically different from lines beginning with
>> ASCII LF characters, although many detractors of Python seem unduly
>> fixated on it.
> 
> The salient difference being that U+000A LINE FEED is easily visually
> distinguished from a short sequence of U+0020 SPACE characters. This
> avoids the confusion, and makes use of both together unproblematic.

True, but that's *only* because your editor chooses to follow the 
convention "display a LINE FEED by starting a new line" rather than by 
the convention "display the (invisible or zero-width) glyph of the LINE 
FEED". If editors were to standardise on the convention "display a 
HORIZONTAL TAB character as visibly distinct from a sequence of 
spaces" (e.g. by shading the background a different colour, or overlying 
it with an arrow) then we would not be having this discussion.

In other words, it is the choice of editors to be *insufficiently smart* 
about tabs that causes the problem. There is a vicious circle here:

* editors don't handle tabs correctly

* which leads to (some) people believing that "tabs are bad" and should 
be avoided

* which leads to editors failing to handle tabs correctly, because "tabs 
are bad" and should be avoided.


A pity really.



-- 
Steven



More information about the Python-list mailing list