Significant whitespace

Marcin 'Qrczak' Kowalczyk qrczak at knm.org.pl
Wed Oct 8 06:20:40 EDT 2003


On Tue, 07 Oct 2003 21:25:53 +0100, Alexander Schmolck wrote:

> Python removes this significant problem, at as far as I'm aware no real cost
> and plenty of additional gain (less visual clutter, no waste of delimiter
> characters ('{','}') or introduction of keywords that will be sorely missed as
> user-definable names ('begin', 'end')).

There are three choices for a lanuage syntax:
1. Line breaks and indents are significant (Haskell, Python).
2. Line breaks only are significant (Ruby, Unix shell, Visual Basic).
3. Neither is significant (most languages).

I found the syntax of Haskell and Python aesthetic, and tried to introduce
significant whitespace into my own little language. It was surprisingly hard.

The first attempt used a quite minimalistic syntax and had significant
indents. In effect indentation errors usually went undetected and the
program suddently had a different meaning. Since you wouldn't want to
consistently use indentation for all local functions, they used either
braces or indentation - but not both! so it looked very differently
depending on whether you wanted to make use of significant indents or not.
And since it was a functional language, it used quite a lot of nesting.
I quickly abandoned this version.

Although misindenting Haskell code can produce a valid parse, the error
is usually caught either by scoping rules or by the typechecker; my
language was dynamically typed. Haskell doesn't have the "inconsistency"
problem because when you omit a line break which would introduce or close
indentation, you usually don't have to insert braces - syntax rules say
that virtual closing braces are inserted when not inserting it would cause
a parse error. Unfortunately this rule is almost impossible to implement
correctly (current compilers fail to use it correctly in some subtle cases).
There are cases when the language requires a different indentation than
I would like to use (mostly 'if-then-else' and 'let' inside 'do').

Python has a simpler syntax, where indentation is used on the level of
statements as the only delimiting mechanism, and not on the level of
expressions - which can't contain statements. It doesn't allow to replace
indentation with explicit delimiters. Since most expressions have pending
open parens or brackets when cut in the middle (because of mandatory
parens around function arguments), most line breaks inside expressions are
identifiable as insignificant without explicit marking. So it's easy to
design rules which use indentation, at the cost of the inability to
express various things as expressions. It's an imperative language and
such syntax won't fit a functional language where you would want to have
a deeper nesting, and where almost everything can be used inside an
expression.

Moral: Haskell and Python happen to succeed with significant indents
but their rules are hard to adapt to other languages. Significant
indentation constrains the syntax - if you like these constraints, fine,
but it would hurt if a language were incompatible with these constraints.

Having failed with significant indents, I tried to use significant line
breaks in next incarnations of my language, which looked like a good
compromise. The language had a richer syntax this time and it worked
quite well, except that one too often wanted to break a line in a place
which had to be explicitly marked as an insignificant break. I had
troubles with designing a good syntax for some constructs, mainly
if-then-else, being constrained to syntaxes which can be nicely split
into lines.

After experimenting with various syntaxes which used significant line
breaks to separate declarations and statements (delimiting was first done
by an opening word and 'end', later by braces), I tried how it would look
like with explicit semicolons. Surprisingly this opened new ways to build
some syntactic constructs. I finally got an 'if' which I was happy with,
I was no longer forced to choose to either not break a particular long
line or to mark the line break as insignificant, and I could abandon
designing a built-in syntax for catching exceptions because using a
suitable function no longer interfered with line breaking.

Moral is the same. Although designing and implementing a syntax with
significant line breaks and insignificant indentation is much easier than
with significant indentation, it still takes away some freedom of syntax
design which might be noticeable. Perhaps there are subtle ways to apply
significant line breaks to various languages, which you might find with
some luck or experience... I've given up, designing a syntax with
insignificant whitespace is much safer.

-- 
   __("<         Marcin Kowalczyk
   \__/       qrczak at knm.org.pl
    ^^     http://qrnik.knm.org.pl/~qrczak/





More information about the Python-list mailing list