OT: Programmers whos first language is not English

Fri Mar 14 16:31:29 EST 2003

On 14 Mar 2003 12:43:42 -0800, anamax at earthlink.net (Andy Freeman)
wrote:

>Stephen Horne <intentionally at blank.co.uk> wrote in message news:<k44u6vc4kg9g7ddiskl2lcstmra5kup87l at 4ax.com>...
>> As for the parsing being trivial - it's not exactly rocket science in
>> either case, though I admit that LISP-style expressions barely need
>> more than tokenising and parenthesis matching. But even trivial
>> scanners and parsers are, well, sufficiently non-trivial that writing
>> one when theres one already written and available to use seems a bit
>> masochistic.
>
>You're looking at the wrong problem.
>
>I've found that the time/energy/etc saved by folks who omit "unnecessary"
>grouping elements (parentheses, curly brackets, etc.) is dwarfed by the
>time spent fixing problems due to omitting said "unnecessary" elements.
>
>If there were only binary operators and there were only three of them
>and there was a very tight limit on the expression complexity, then,
>and only then, would precedence make sense for code written/read by humans.
>
>It took me a while to figure that out, but it sure would be nice if
>language designers learned that parsers capabilities far exceed
>programmer abilities, and that the latter is a more important
>constraint.
>
>It would be okay to omit "unnecessary" parens in a hidden representation,
>but the display and authored representation should have them.

We're not discussing something that programmers are meant to read and
write. We're discussing a marked-up-text format that will be read by
compilers and programmers editors, but not by people.

And NO-ONE MENTIONED OMITTING "UNNECESSARY" PARENS.

XML grouping syntax is even more strict than lisp grouping syntax, as
the parentheses-equivalent is explicit about exactly what it is
ending.

If you are referring to the following statement, however...

|Having elements for punctuation such as parentheses and commas seems
|extreme and unnecessary. It only becomes necessary if you have no
|clear distinction between text and markup.

Note that I'm saying punctuation need not be handled as elements. That
is, punctuation such as parentheses and commas from the visible form
of the source code will be simply and directly inserted into the XML
without markup. They are not being excluded - they are merely being
saved as plaintext without markup.

>> Parenthesis matching (especially with all parenthesised expressions
>> using the same pair of symbols) is a much weaker check.
>
>Umm, who said that a lisp-aware editor can only do paren matching?
>Even in the 70s, they knew the structure of special forms.  (Paren
>matching IS sufficient for lisp expressions.)

If you'd read the thread, you'd know the context - this is in
comparison with XML and refers to the fact that an XML end element is
explicit about what type of start element it matches to.

In the following code, which expression is missing a closing bracket?

  (a (b)

For human written code, who cares - you just add the missing bracket
and its fixed. For automatically generated code, however, it is
extremely useful to know which part of the generation code - the 'a'
expression generator or the 'b' expression generator - didn't output
its matching end bracket.

Now consider the following...

  <a><b></a>

Suddenly, we know that it is actually the 'b' expression that didn't
get terminated.

Now imagine that with a large generated source file containing
thousands of expressions. Imagine theres more than one error in the
code generator - some expressions get an extra close bracket, so in
some expressions the two errors might cancel out in a lisp-like
language leading to the belief that those types of expressions are
being generated correctly.

If the generated language has a lisp-like structure, you're in for
hours of investigating to find out what the problem is. If the
generated language is XML based, however, you don't just get told
there's a parenthesis matching error - you get told exactly which
expression had a mismatch - you can go straight to the source of the
error in the code generator in a matter of seconds.

>This shouldn't be a surprise - other language-aware editors knew about
>keywords by the 80s.  (They'd auto-insert "then", "do", "end", etc as
>appropriate.)

Here is something I wrote earlier...

|In particular, I'm thinking of using XML - not as an AST
|representation, but merely as a way of marking up source code. This
|would require special editors, of course, but if WYSIWYG editors can
|be created for HTML I don't see why programmers are still stuck in the
|plaintext age.
|
|One possible use of XML might be that 'keywords' and 'symbols' could
|be stored as XML elements specifying non-language-specific tokens -
|the editor could have a local language table to recognise keywords as
|the programmer types (or could use hotkeys to insert whole keywords
|Speccy-style) and could present them on screen with colour
|highlighting. This would require little (if any) more work than
|existing syntax-highlighting editors.

Not clear enough - how about...

|In the beginning, this will be small stuff. The user types a word and
|the editor highlights it as an identifier rather than a keyword. But
|instead of simply displaying the word in a different colour, it also
|uses a different markup when saving. Thus, when you load it into the
|Thingy-3000 version which has 20,000 new keywords, it knows that the
|words which happen to be spelled the same as these new keywords in
|your original file actually happen to be identifiers.