OT: Tab characters considered harmful (Was: Emacs has eaten my python tabs!!!)

Robin Munn rmunn at pobox.com
Wed Jan 29 12:49:46 EST 2003


John Roth <johnroth at ameritech.net> wrote:
> 
> "Dennis Lee Bieber" <wlfraed at ix.netcom.com> wrote in message
> news:pm7eg-im3.ln1 at beastie.ix.netcom.com...
>> All tabs works fine regardless of editor tab stops. All spaces works
>> fine. Mixed spaces and tabs is deadly.
> 
> So is the use of tabs in certain contexts; in particular any attempt
> to e-mail a python source to someone using Outlook Express. OE
> has this nasty habit of ignoring leading tabs.

You know, the more I see the debate about spaces and tabs, the more
puzzled I become that anyone still wants to use tabs. They *only* work
nicely, IME, when a source file has only one author making changes, with
only one editor. Give the file to another author (who uses a different
editor) and boom! Nasty stuff happens, about 80-85% of the time. Here's
an example. Tabs are represented as '>___', spaces as '.'.

First author (tabstop=4, uses tabs):

----- Begin code -----
class Foo:
>___def bar(x):
>___>___if (x>10): return x
>___>___else: return x+1
>___def quux(z):
>___>___if (z<0): return z
>___>___else: return z-1
----- End code -----

Okay, we've written a class with some methods. Now the code gets looked
at by another author, who decides that those if statements should really
be split across multiple lines with proper indentation to make the code
easier to read. He also wants to put in some print statements for
debugging purposes.

Second author (tabstop=8, uses spaces):

What the second author sees:

----- Begin code -----
class Foo:
>_______def bar(x):
>_______>_______if (x>10): return x
>_______>_______else: return x+1
>_______def quux(z):
>_______>_______if (z<0): return z
>_______>_______else: return z-1
----- End code -----

"Okay," thinks the second author, "This guy likes 8-space indents. I
think 4-space indents looks better, but better not reformat the entire
code, or tracking CVS changes will get really messy. I'll go with the
style of the code that's already here." So he makes his changes. But
remember: his editor is configured to use spaces for indents. Here's the
result:

----- Begin code -----
class Foo:
>_______def bar(x):
................print "Starting bar"
>_______>_______if (x>10):
........................return x
>_______>_______else:
........................return x+1
>_______def quux(z):
................print "Starting quux"
>_______>_______if (z<0):
........................return z
>_______>_______else:
........................return z-1
----- End code -----

So far so good, right? Wrong. Python will accept this code, and it will
work as intended, because Python considers a tab equivalent to 8 spaces.
But then the original first author takes a look at this code, and this
is what he sees:

First author (tabstop=4, uses tabs):

----- Begin code -----
class Foo:
>___def bar(x):
................print "Starting bar"
>___>___if (x>10):
........................return x
>___>___else:
........................return x+1
>___def quux(z):
................print "Starting quux"
>___>___if (z<0):
........................return z
>___>___else:
........................return z-1
----- End code -----

"What an unholy mess," thinks the first author. "What was that other guy
*thinking* when he edited this? Those print statements don't even line
up with the rest of the code! I'll fix this." And thus, this happens:

----- Begin code -----
class Foo:
>___def bar(x):
........print "Starting bar"
>___>___if (x>10):
............return x
>___>___else:
............return x+1
>___def quux(z):
........print "Starting quux"
>___>___if (z<0):
............return z
>___>___else:
............return z-1
----- End code -----

This is getting worse and worse! The previous code looked ugly, but at
least it still ran. But this code won't even run. Remember that Python
considers tabs to be 8 spaces, and so what this code actually looks like
to Python is:

----- Begin code -----
class Foo:
>_______def bar(x):
........print "Starting bar"
>_______>_______if (x>10):
............return x
>_______>_______else:
............return x+1
>_______def quux(z):
........print "Starting quux"
>_______>_______if (z<0):
............return z
>_______>_______else:
............return z-1
----- End code -----

That won't even run. It will fail with an IndentationError at the very
first print statement!


I wish this were an entirely made-up example. The code here is made up
(and utterly useless :-), but I've seen worse mixtures of tabs and
spaces in actual, non-fictitious code. By now, I cringe every time I see
a tab character in source code that I have to edit, because I know
indentation issues are going to be ugly. I have to guess what tabstop
settings the original author used so that I can exactly mimic his view
of the code, otherwise my own additions will look fine on my screen but
ugly on his. And nobody ever seems to document what tabstop settings
they use! A simple comment in the header of the file would make a lot of
difference...

I have heard two arguments for the use of tabs. One is that tabs are
only one character for an indent, while spaces are four or eight
characters -- thus you can save a lot of space using tabs instead of
spaces. The other argument is that if everyone used tabs for
indentation, there would be no arguments about four-space indents or
eight-space indents (or even something horrifyingly alien like
three-space indents, yecch). Instead, everyone would set tabstop to
their favorite indentation level and code would always look nice.

The first argument (tabs save disk space) may have had relevance twenty
years ago, but no longer. With disk space in the multi-dozen gigabytes
available cheap, saving a byte or seven at the expense of clarity is no
longer a good trade-off. This kind of reasoning is what led to the Y2K
bug. In the rare cases where a few kilobytes might make a difference
(say, transferring files across a very-low-bandwidth connection),
compression provides a much greater savings than tabs. Besides, a file
using spaces for indentation will compress to virtually the same size as
a file using tabs. The compression might even be better with spaces
instead of tabs, because of having one less different character in the
original file -- it would depend on the compression algorithm. Certainly
I would expect the difference to be minimal.

The second argument (tabs mean everyone can use their favorite
indentation) sounds nice at first, but simply doesn't work in the real
world. It inevitably, and I do mean *inevitably*, leads to ugly
space-tab mixing. Even code written by only one author using one text
editor won't be able to avoid it! Look at this:

----- Begin code -----
>____while (continue_flag &&
>____.......some_test_func() &&
>____>____..some_other_func()):
>____>____do_the_loop()
----- End code -----

Look what happened here. We've got someone using tabstop=4 and using
tabs for indentation. Naturally he's using a text editor with
autoindent, and his editor naturally fills in as much indentation as
possible using tabs. This is the default, AFAIK, for most editors if
they are set to use tabs. Now this person knows that you should only use
tabs for "real" indentation levels, and that anything else (such as
giving continuation lines "extra" indentation to make them line up)
should be done with spaces. (Using tabs to help line up continuation
lines and the like leads to even more problems, but this is a long
enough post already!) So he added spaces to fill in his second line. Now
his editor, being "helpful", gave him the same indentation level on the
third line. But what he probably doesn't realize (unless he has "View
Tab Characters" turned on) is that the third line's indentation doesn't
actually match the second line!

I don't like tab characters. I wish they would go away and that everyone
would stop using them. They cause me far too many headaches, and I have
never found a good use for them. Every time I have to edit a source file
with tabs, I have to explicitly think about indentation levels and
whether that blank on the screen is an ASCII 0x20 or part of an ASCII
0x09. I thought the point of modern text editors was to let you *stop*
worrying about indentation! But tabs, as I have demonstrated above,
force you to keep track of what your editor is doing.

Lest anyone get me wrong, I am in no way opposed to using the Tab *key*
on your keyboard to indent lines. That's a great use for it. My Vim has
the "expandtab", "smarttab", and "autoindent" options turned on, which
makes my Tab key practically into a Do-What-I-Mean key. And it always
inserts spaces (ASCII 0x20), not tabs (ASCII 0x09). The Tab key is good.
But tab characters are bad.

Tab characters have no place in modern source code. Their use is
considered harmful and should be strictly avoided.

-- 
Robin Munn <rmunn at pobox.com>
http://www.rmunn.com/
PGP key ID: 0x6AFB6838    50FF 2478 CFFB 081A 8338  54F7 845D ACFD 6AFB 6838




More information about the Python-list mailing list