Python syntax in Lisp and Scheme

Wed Oct 15 12:56:33 EDT 2003

prunesquallor at comcast.net writes:

> Consider this python code (lines numbered for exposition):

Thanks for going through the trouble of posting an elaborate example.

> 
>  1 def dump(st):
>  2     mode, ino, dev, nlink, uid, gid, size, atime, mtime, ctime = st
>  3     print "- size:", size, "bytes"
>  4     print "- owner:", uid, gid
>  5     print "- created:", time.ctime(ctime)
>  6     print "- last accessed:", time.ctime(atime)
>  7     print "- last modified:", time.ctime(mtime)
>  8     print "- mode:", oct(mode)
>  9     print "- inode/dev:", ino, dev
> 10
> 11 def index(directory):
> 12     # like os.listdir, but traverses directory trees
> 13     stack = [directory]
> 14     files = []
> 15     while stack:
> 16         directory = stack.pop()
> 17         for file in os.listdir(directory):
> 18             fullname = os.path.join(directory, file)
> 19             files.append(fullname)
> 20             if os.path.isdir(fullname) and not os.path.islink(fullname):
> 21                 stack.append(fullname)
> 22     return files

[...]

> But let us consider cutting lines 6 and 7 and putting them
> between lines 21 and 22.  We get this: 
> 
> 15     while stack:
> 16         directory = stack.pop()
> 17         for file in os.listdir(directory):
> 18             fullname = os.path.join(directory, file)
> 19             files.append(fullname)
> 20             if os.path.isdir(fullname) and not os.path.islink(fullname):
> 21                 stack.append(fullname)
>  6     print "- last accessed:", time.ctime(atime)
>  7     print "- last modified:", time.ctime(mtime)
> 22     return files
> 
> But it is unclear whether the intent was to be outside the while,
> or outside the for, or part of the if.

I don't think so. Before pasting you just move your cursor to #1 #2 or #3 to
achieve the respective results:

> 15     while stack:
> 16         directory = stack.pop()
> 17         for file in os.listdir(directory):
> 18             fullname = os.path.join(directory, file)
> 19             files.append(fullname)
> 20             if os.path.isdir(fullname) and not os.path.islink(fullname):
> 21                 stack.append(fullname)
         #1  #2      #3
> 22     return files

What's wrong with that? (If nothing, I'll finally hack it up in elisp,
unless someone already has done it).

[snipped]

> Now consider this `pseudo-equivalent' parenthesized code:
> 
>  1 (def dump (st)
>  2    (destructuring-bind (mode ino dev nlink uid gid size atime mtime ctime) st
>  3       (print "- size:" size "bytes")
>  4       (print "- owner:" uid gid)
>  5       (print "- created:" (time.ctime ctime))
>  6       (print "- last accessed:" (time.ctime atime))
>  7       (print "- last modified:" (time.ctime mtime))
>  8       (print "- mode:" (oct mode))
>  9       (print "- inode/dev:" ino dev)))
> 10
> 11 (def index (directory)
> 12     ;; like os.listdir, but traverses directory trees
> 13     (let ((stack directory)
> 14           (files '()))
> 15       (while stack
> 16         (setq directory (stack-pop))
> 17         (dolist (file (os-listdir directory))
> 18            (let ((fullname (os-path-join directory file)))
> 19              (push fullname files)
> 20              (if (and (os-path-isdir fullname) (not (os-path-islink fullname)))
> 21                  (push fullname stack)))))
> 22       files))
> 
> If we cut lines 6 and 7 with the intent of inserting them
> in the vicinity of line 21, we have several options (as in python),
> but rather than insert them incorrectly and then fix them, we have
> the option of inserting them into the correct place to begin with.
> In the line `(push fullname stack)))))', there are several close
> parens that indicate the closing of the WHILE, DOLIST, LET, and IF,
> assuming we wanted to include the lines in the DOLIST, but not
> in the LET or IF, we'd insert here:
>                                             V
> 21                  (push fullname stack)))   ))

AFAICT this is a more difficult editing operation to get right than what I
suggested above to get the desired effect for python and you still have to
reindent.

> 
> The resulting code is ugly:
[...]
> But it is correct.

Correct as in "has the desired behavior" is not the only thing that counts.

> 
> (Incidentally inserting at that point is easy:  you move the cursor over
> the parens until the matching one at the beginning of the DOLIST begins
> to blink.  At this point, you know that you are at the same syntactic level
> as the dolist.)

Sure, it's not very taxing, but I think still slightly more so than placing
the cursor at #1 #2 or #3 for the python example (you need to invoke extra
functionality, even if this functionality is quite convinient and you need to
devote visual attention to an extra site).

> 
> >> >> The fact that the information is replicated, and that there is nothing
> >> >> but programmer discipline keeping it consistent is a source of errors.
> 
> Let me expand on this point.  The lines I cut are very similar to each
> other, and very different from the lines where I placed them.  But
> suppose they were not, and I had ended up with this:
> 
> 19             files.append(fullname)
> 20             if os.path.isdir(fullname) and not os.path.islink(fullname):
> 21                 stack.append(fullname)
>  6     print "- last accessed:", time.ctime(atime)
>  7     print "- last modified:", time.ctime(mtime)
> 22     print "- copacetic"
> 23     return files
> 
> Now you can see that lines 6 and 7 ought to be re-indented, but line 22
> should not. It would be rather easy to either accidentally group line seven
> with line 22, or conversely line 22 with line 7.

True, but as I stated above AFAICT there is no need to paste-and fix, you can
just paste correctly (Digression for the emacs-interested: a really
emacs-savvy user will paste and fix in a pretty safe fashion anyway, because
the source of error above would be to inadvertently select a (wrong) region
for reindentation. If you just reindent (with C-c> or C-c<) right after
pasting (the pasted code will automatically form the region then) you don't
have this problem).

> 
> >> > Sure there is. Your editor and immediate visual feedback (no need to remember
> >> > to reindent after making the semantic changes).
> >> 
> >> `immediate visual feedback' = programmer discipline
> >> Laxness at this point is a source of errors.
> >
> > You got it backwards. 
> > Not forgetting to press 'M-C-\' = programmer discipline.
> > Laxness at this point is a source of errors. 
> 
> Forgetting to indent properly in a lisp program does not yield
> erroneous code.

I think this highlights an implicit but mistaken assumption underlying
anti WS arguments.

No, of course forgetting to indent does not *directly* yield erroneous code.
That's not good enough, however because it is associated with 2 problems:

1) masking an error: if you perform an operation that can go wrong and
   frequently does (such as issuing an editing command) then enforcing manual
   verification of the desired outcome (read C-M-\) is a source of error.

2) misleading later readers (including the programmer himself, esp. while he's
   editing): the bad indentation might well suggest an altnernative meaning
   from the actually intended one that goes unnoticed till someone reindents
   the code -- again a source of errors.

> 
> > And indeed, people *do* have to be educated not to be lax when editing
> > lisp - newbies frequently get told in c.l.l or c.l.s that they should have
> > reindented their code because then they would have seen that they got
> > their parens mixed up.
> 
> This is correct.  But what is recommended here is to use a simple tool to
> enhance readability and do a trivial syntactic check.

I think "enhance readability" is more than an understatement. Lisp code of
reasonable complexity is simply unreadable if not or arbitrarily indented
(even more so than XML!).

> > OTOH, if you make an edit in python the result of this edit is immediately
> > obvious -- no mismatch between what you think it means and what your computer
> > thinks it means and thus no (extra) programmer discipline required.
> 
> Would that this were the case.  Lisp code that is poorly indented will still
> run.
> Python code that is poorly indented will not. I have seen people write lisp
> code like this:
> 
> (defun factorial (x)
> (if (> x 0)
> x
> (*
> (factorial (- x 1))
> x
> )))
> 
> I still tell them to re-indent it.  A beginner writing python in this manner
> would be unable to make the code run.

But the fact that you can't do this in python (and BTW I've never seen a
*python* newbie TRY -- have you?) is a FEATURE, for crying out loud.

Or what exactly do you think the benefit of making it easier for beginners to
write error-prone and unreadable code to be? Why is it a paedagogical
disadvantageous for python beginners to automatically write readable code that
does what they think it does (at least as far as block structure is concerned)
compared to lisp newbies who either often write code that doesn't do what they
(and other readers) think it does (because of indentation/paren mismatch) or
that is inpenetrable (just think about how many hours these people have wasted
trying to decipher and debug their code before you gave them the fatherly
advice to try to indent it)?

> Ok.  For any sort of semantic error (one in which a statement is
> associated with an incorrect group) one could make in python, there is
> an analagous one in lisp, and vice versa.  This is simply because both
> have unambiguous parse trees.
> 
> However, there is a class of *syntactic* error that is possible in
> python, but is not possible in lisp (or C or any language with
> balanced delimiters).  Moreover, this class of error is common,
> frequently encountered during editing, and it cannot be detected
> mechanically.

This argument basically boils down to "lisp is more redundant and therefore
less error prone". This inself is not a valid argument as it depends on the
type of redundancy and the extend to which this redundancy itself causes
errors, e.g. by rendering the code more obscure by additional verbosity (as in
xml compared to sexps); whether this redundancy helps or hinders perception
and how this redundancy interferes with the editing process.

As an example of such interference in the case of lisp consider the example of
commenting/deleteing/appending after a line with surplus trailing parens
(corresponding ot opening parens in earlier lines). Conceptually this line is
the same as all the other lines in the code in the same block, but you have to
use quite different (and more complicated and error-prone) commands to achieve
the same editing process. Not so in python.

> 
> Consider this thought experiment:  pick a character (like parenthesis
> for example) go to a random line in a lisp file and insert four of them.

This thougt experiment seems of limited informational value. While I can see
how you might accidentally indent 4 spaces by pressing tab (which you'd
normally notice) accidently, inserting 4 spaces or 4 parens (by pressing space
or ')') seems highly unlikely to me.

Additionally most of the time the code is actively being edited and thus *not*
in a consistent state -- this is how most editing errors occur and I don't
think lisp compares favourably to python here.

This is largely due to the fact that the relationship between suggested
meaning and actual meaning of the code is automatically maintained in python
and only semi-automatically by emacs/lisp. As an example consider introducing
additional local vars by adding a let-block. This is not quite equivalent to
the same example in python (which is trivial, editing wise, baring potential
for local name-conflicts), but close enough and (unless you've plenty at
routine at editing lisp code) it's pretty easy to screw things up here (paren
matching makes it easy to match with visually exposed outer parents (viz.
corresponding to e.g. "(LET"'), but I find it's still pretty easy to end up
with wrong parens inside, viz. the var forms/body) -- the more code you write
before reindenting everything the higher the likelyhood something went wrong,
although the total number of parens will be correct.

Anyway, back to your thought-experiment:

> Is the result syntactically correct?  No.  Could a naive user find them?
> Trivially.  Could I program Emacs to find them?  Sure.
> 
> Now go to a random line in a python file and insert four spaces.  Is

Uhm where? A) Anywhere in the line? Since only *leading whitespace* is
significant this wouldn't alter the meaning (safe in strings and identifiers
-- as in CL). So I assume you mean B) at the beginning of the line, in which
case:

> the result syntactically correct?  
> Likely.  

Depends on what you mean by "likely". A necessary condition for the result to
be syntactically valid and semantically different is that the preceding
(non-comment) line has the same indent (which occurs in about of 10% of the
lines in a representative sample [1]).

> Could a naive user find  them?  
> Unlikely. 

Uhm, actually extremely likely because 90% of the cases in B) yield invalid
syntax, which both python and your editor can easily figure out for the naive
user without even running the code.

> Could you write a program to find them?  No.

Sure I could, but writing one that performs better then say 95% would involve
an amount of work unwarranted by the unrealistic scenario.

 > Delete four adjacent parens in a Lisp file.  Will it still compile?  No.
> Will it even be parsable?  No.
> 
> Delete four adjacent spaces in a Python file.  Will it still compile?
> Likely.

See above.

> > No, I didn't want just *any* example of something that can't be displayed;
> > I wanted an example of something that can't be displayed and is

> > *pertinent* to our discussion (based on the Quinean assumption that you
                                                ^^^^^^^
                      [thinko; should have been Gricean]

> > wouldn't have brought up "things that can't be displayed" if they were
> > completely besides the point).
> 
> I thought that whitespace was significant to Python.

Only human readable whitespace [2].

> 
> My computer does not display whitespace. 

Mine sure does.

    if bar:
    foo

and 

    if bar:
       foo

look quite different on my computer. Since this (leading ws) is the only type
of whitespace that semtantically matters to python, I still fail to see your
point.

> I understand that most computers do
> not. There are few fonts that have glyphs at the space character.
> 
> Since having the correct amount of whitespace is *vital* to the
> correct operation of a Python program, it seems that the task of
> maintaining it is made that munch more difficult because it is only
> conspicuous by its absence.

No idea what you're driving at -- maybe you've got some wrong assumptions
about the details of python's syntax?

> 
> Sussman is careful to separate the equations of classical mechanics
> from the *implementation* of those equations in the computer, the

I really don't want to push this point too hard, but why not use sexps for
both if sexp-readability was near optimal (Iverson seems to have done the
equivlanent for his mathematical books using APL/J)?

> former are written using a functional mathematical notation similar to
> that used by Spivak, the latter in Scheme.  The two appendixes give
> the details.  Sussman, however, notes ``For very complicated
> expressions the prefix notation of Scheme is often better''

That statement seems to suggest to me that in the general case MN is
preferable.

> > I don't personally think (properly formated) lisp reads that badly at all
> > (compared to say C++ or java) and you sure got the word-seperators right.
> > But to claim that using lisp-style parens are in better conformance with
> > the dictum above than python-style indentation frankly strikes me as a bit
> > silly (whatever other strengths and weaknesses these respective syntaxes
> > might have).
> 
> And where did I claim that?  You originally stated:

Well I didn't explicitly say you did, but you contested the claim requoted
below about ')))))))))' (I'm still not quite sure why) and stated that Abelson
and Sussman would disagree with my assesments of parens and whitespace, so
this interpretation seemed not entirely implausible to me.

> 
> > Still, I'm sure you're familiar with the following quote (with which I most
> > heartily agree):
> >
> >  "[P]rograms must be written for people to read, and only incidentally for
> >   machines to execute."
> >
> > People can't "read" '))))))))'.
> 
> Quoting Sussman and Abelson as a prelude to stating that parenthesis are
> unreadable is hardly going to be convincing to anyone.

I didn't say parens are generally unreadable, as the quote above shows I said
that people can't read *large numbers of trailing parens*, which is quite
different (I would also have had to contradict myself otherwise in stating
that properly formated lisp reads well)?

The reason why I brought up this quote is because you were trashing python's
indentation based syntax (which, remember, I didn't claim was suitable or
"better" than sexps for lisp or similar expression based languages) as an
illustration that python's syntax at least compares favourably to sexps in an
area that is of regarded as very important by prominent members of the
lisp/scheme community.

Look, I *like* lisp syntax (and I'm even happy with you to claim it's better
than python's, as long as I can dissuade you from claiming that it's terribly
error-prone, unless you've actually tried writing some reasonable amount of
python yourself).

But clearly a syntax were some vital information is not so much there for
people to read but for editors and compilers is not in full conformance with
the above goal (if you really disagree with this, again why alias '[]' in
pretty much all schemes if nested '()'s are just as readable?). 

In python you don't need the editor to semi-automatically put comments in your
code (which is what pressing 'M-C-\' amounts to) to render it intelligible.

> 
> >> Obviously the indentation.  
> >> But I'd notice the mismatch.
> >
> > (Hmm, you or emacs?)
> 
> Does it matter?

Of course it does matter. We were talking about readability by *humans*.

I know that emacs has no trouble "reading" 7 trailing parenthesis (i.e.
extracting the relevant semantic information from them, namely what they
delimit), but I doubt you (or other humans) can (without some prosthetic aid
in the form of brain implants or manually invoked emacs commands).

> >> If I gave you a piece of python code jotted down on paper that (as these
> >> hypothetical examples usually are) for some reason was of vital importance
> >> but I accidentally misplaced the indentation -- how would you know?
> >
> > Excellent point. But -- wait! Were it Lisp, how would I know that you didn't
> > intend e.g.
> >
> >   (if (bar watz) foo)
> >
> > instead of 
> >
> >   (if (bar) watz foo)
> 
> You are presupposing *two* errors of two different kinds here:  the
> accidental inclusion of an extra parenthesis after bar *and* the
> accidental omission of a parenthesis after watz.
> 
> The kind of error I am talking about with Python code is a single
> error of either omission or inclusion.

In my book this is a *single* error of one kind: accidental misplacement of a
parenthesis (sure happens to me as an *atomic* operation). Does this never
happen to you (maybe you use more effective editing strategies, M-( etc.)?

> 
> > Moral: I really think your (stereoptypical) argument that the possibility
> > of inconsistency between "user interpretation" and "machine
> > interpretation" of a certain syntax is a feature (because it introduces
> > redundancy that can can be used for error detection) requires a bit more
> > work.
> 
> I could hardly care less.

Well, anyway I'll leave it at that before this one completely degrades, too. I
hope at least the discussion of editing errors and strategies will have had
some value.

'as

Footnotes

[1] [I really get to write more than my fair share of perl in this thread ;|]

  perl -ne '/^ */; $lines++; $canIndent++ if (length($last)
  == length($&)+4);$last = $&; END{print "LOC:$lines CANINDENT:$canIndent\n";
  print "ratio:" . $canIndent/$lines . "\n"}' /usr/local/lib/python2.3/*.py

  LOC:76264 CANINDENT:9307
  ratio:0.122036609671667

  This is an upper boundary, BTW. Not all these lines would semantically
  change if reindented.

[2] modulo #\Tab which we can include from the current discussion because its
  use is discouraged and the wart that python still allows use of tabs is not
  really relevant to evaluating the merrits of the use of identation for
  block-structure indication as such.