Using non-ascii symbols

Tue Jan 24 11:05:09 EST 2006

Christoph Zwerschke wrote:
> On the page http://wiki.python.org/moin/Python3%2e0Suggestions
> I noticed an interesting suggestion:
> 
> "These operators ≤ ≥ ≠ should be added to the language having the 
> following meaning:
> 
>       <= >= !=
> 
> this should improve readibility (and make language more accessible to 
> beginners).
> 
> This should be an evolution similar to the digraphe and trigraph 
> (digramme et trigramme) from C and C++ languages."
> 
> How do people on this group feel about this suggestion?
> 
> The symbols above are not even latin-1, you need utf-8.
> 
> (There are not many usefuls symbols in latin-1. Maybe one could use × 
> for cartesian products...)
> 
> And while they are better readable, they are not better typable (at 
> least with most current editors).
> 
> Is this idea absurd or will one day our children think that restricting 
> to 7-bit ascii was absurd?
> 
> Are there similar attempts in other languages? I can only think of APL, 
> but that was a long time ago.
> 
> Once you open your mind for using non-ascii symbols, I'm sure one can 
> find a bunch of useful applications. Variable names could be allowed to 
> be non-ascii, as in XML. Think class names in Arabian... Or you could 
> use Greek letters if you run out of one-letter variable names, just as 
> Mathematicians do. Would this be desirable or rather a horror scenario? 
> Opinions?
> 
> -- Christoph

This will eventually happen in some form.  The problem is that we are 
still in the infancy of computing.  We are using stones and chisels to 
express logic.  We are currently faced with text characters with which 
to express intent.  There will come a time when we are able to represent 
a program in another form that is readily portable to many platforms.

In the meantime (probably 50 years or so), it would be advantageous to 
use a universal character set for coding programs.  To that end, the 
input to the Python interpreter should be ISO-10646 or a subset such as 
Unicode.  If the # -*- coding: ? -*- line specifies something other than 
ucs-4, then a preprocessor should convert it to ucs-4.  When it is 
desireable to avoid the overhead of the preprocessor, developers will 
find a way to save source code in ucs-4 encoding.

The problem with using Unicode in utf-8 and utf-16 forms is that the 
code will forever need to be written and forever execute additional 
processing to handle the MBCS and MSCS (Multiple-Short Character Set) 
situation.

Ok.  Maybe computing is past infancy.  But most development environments 
are not much past toddler stage.