Problems of Symbol Congestion in Computer Languages

Dan Stromberg drsalists at gmail.com
Thu Feb 17 21:59:44 EST 2011


I prefer special symbol "congestion" to special symbol proliferation.  A
lot.  A language with few special symbols looks less like line noise, is
easier to read and write, and is easier to google for answers about.

I guess nothing's perfect.

On Wed, Feb 16, 2011 at 2:07 PM, Xah Lee <xahlee at gmail.com> wrote:

> might be interesting.
>
> 〈Problems of Symbol Congestion in Computer Languages (ASCII Jam;
> Unicode; Fortress)〉
> http://xahlee.org/comp/comp_lang_unicode.html
>
> --------------------------------------------------
> Problems of Symbol Congestion in Computer Languages (ASCII Jam;
> Unicode; Fortress)
>
> Xah Lee, 2011-02-05, 2011-02-15
>
> Vast majority of computer languages use ASCII as its character set.
> This means, it jams multitude of operators into about 20 symbols.
> Often, a symbol has multiple meanings depending on contex. Also, a
> sequence of chars are used as a single symbol as a workaround for lack
> of symbols. Even for languages that use Unicode as its char set (e.g.
> Java, XML), often still use the ~20 ASCII symbols for all its
> operators. The only exceptions i know of are Mathematica, Fortress,
> APL. This page gives some examples of problems created by symbol
> congestion.
>
> -------------------------------
> Symbol Congestion Workarounds
>
> --------------------
> Multiple Meanings of a Symbol
>
> Here are some common examples of a symbol that has multiple meanings
> depending on context:
>
> In Java, [ ] is a delimiter for array, also a delimiter for getting a
> element of array, also as part of the syntax for declaring a array
> type.
>
> In Java and many other langs, ( ) is used for expression grouping,
> also as delimiter for arguments of a function call, also as delimiters
> for parameters of a function's declaration.
>
> In Perl and many other langs, : is used as a separator in a ternary
> expression e.g. (test ? "yes" : "no"), also as a namespace separator
> (e.g. use Data::Dumper;).
>
> In URL, / is used as path separators, but also as indicator of
> protocol. e.g. http://example.org/comp/unicode.html
>
> In Python and many others, < is used for “less than” boolean operator,
> but also as a alignment flag in its “format” method, also as a
> delimiter of named group in regex, and also as part of char in other
> operators that are made of 2 chars, e.g.: << <= <<= <>.
>
> --------------------
> Examples of Multip-Char Operators
>
> Here are some common examples of operators that are made of multiple
> characters: || && == <= != ** =+ =* := ++ -- :: // /* (* …
>
> -------------------------------
> Fortress & Unicode
>
> The language designer Guy Steele recently gave a very interesting
> talk. See: Guy Steele on Parallel Programing. In it, he showed code
> snippets of his language Fortress, which freely uses Unicode as
> operators.
>
> For example, list delimiters are not the typical curly bracket {1,2,3}
> or square bracket [1,2,3], but the unicode angle bracket ⟨1,2,3⟩.
> (See: Matching Brackets in Unicode.) It also uses the circle plus ⊕ as
> operator. (See: Math Symbols in Unicode.)
>
> -------------------------------
> Problems of Symbol Congestion
>
> I really appreciate such use of unicode. The tradition of sticking to
> the 95 chars in ASCII of 1960s is extremely limiting. It creates
> complex problems manifested in:
>
>    * String Escape mechanism (C's backslash \n, \/, …, widely
> adopted.)
>    * Complex delimiters for strings. (Python's triple quotes and
> perl's variable delimiters q() q[] q{} m//, and heredoc. (See: Strings
> in Perl and Python ◇ Heredoc mechanism in PHP and Perl.)
>    * Crazy leaning toothpicks syndrome, especially bad in emacs
> regex.
>    * Complexities in character representation (See: Emacs's Key
> Notations Explained (/r, ^M, C-m, RET, <return>, M-, meta) ◇ HTML
> entities problems. See: HTML Entities, Ampersand, Unicode, Semantics.)
>    * URL Percent Encoding problems and complexities: Javascript
> Encode URL, Escape String
>
> All these problems occur because we are jamming so many meanings into
> about 20 symbols in ASCII.
>
> See also:
>
>    * Computer Language Design: Strings Syntax
>    * HTML6: Your JSON and SXML Simplified
>
> Most of today's languages do not support unicode in function or
> variable names, so you can forget about using unicode in variable
> names (e.g. α=3) or function names (e.g. “lambda” as “λ” or “function”
> as “ƒ”), or defining your own operators (e.g. “⊕”).
>
> However, there are a few languages i know that do support unicode in
> function or variable names. Some of these allow you to define your own
> operators. However, they may not allow unicode for the operator
> symbol. See: Unicode Support in Ruby, Perl, Python, javascript, Java,
> Emacs Lisp, Mathematica.
>
>  Xah
> --
> http://mail.python.org/mailman/listinfo/python-list
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mail.python.org/pipermail/python-list/attachments/20110217/913eb3d3/attachment.html>


More information about the Python-list mailing list