Hi Vernon,

... not to be confused with Vern "the Watcher" Ceder.

On Mon, Jul 18, 2011 at 8:47 AM, Vernon Cole <vernondcole@gmail.com> wrote:
 
There is a very good reason for this:  standard library code must be readable for people all over the world.  That's why a Dutch software engineer wrote a language in which all the keywords and commentary are in English.  


Yes, the Standard Library is to be Anglicized for some time to come, 
maybe always, per Guido's talks.

Of course there's nothing to stop someone from writing a translator 
for the Standard Library, such that the source originals (as modified) 
might be rendered in myriad other charactersets.  

Top-level names tend to be amenable to such treatment.  

This may be done down to the C family level, though I'm not suggesting 
that it should be (nor are all Python implementations C family I hasten 
to add, (a Jython is "C family" if the Java VM is)).

The same is not true for 3rd party modules which, as you say, 
may be written in any style.

Learning the Latin (English) alphabet, building a vocabulary, remains 
a good idea obviously, along with ASCII in the context of Unicode.  

I expect those focused in computer science will continue giving 
themselves the benefit of this learning.

I received Romanized Indonesian source code for quite awhile, until 
the student moved to Japan and apparently stopped doing Python.

I'm impressed with all the alphabets you know.

3rd party modules written in Cyrillic with the peppering of 
Roman we know must be there, thanks to Standard Library
(untranslated) and the 33 keywords (so far), could be used 
in computer science to help English speakers learn a 
Cyrillic language.

http://en.wikipedia.org/wiki/Languages_written_in_a_Cyrillic-derived_alphabet
 
>
> The flip side argument, which I find more persuasive, is that
> one of the biggest barriers to diversity is over-reliance on Latin-1,
> and "just ASCII" in particular.
>
> The whole point of Unicode was to open up source code writing,
> as an occupation, to more than just Euro-English speakers.

I disagree.  The whole point of Unicode is to open up application writing, so that _users_ can see computer output in their own languages.  A person who wishes to pursue code writing as an occupation must understand and use English -- or be relegated to producing work only for his own culture.  In the modern "flat" world, English is the language of commerce and computer programming.  Not being able to write understandable English is a severe handicap. My programs are written in Python, documented in English, and usable by persons of another language.  For example, see CaesarCalc.py from https://launchpad.net/romanclass , which assumes the user to be able to understand pigeon Latin. Even then, I give the result of (XVI - XVI) as "Nulla" because I expect that most users will not recognize "Nvlla" as meaning "nothing."


Certainly the GUI needs to be intelligible yes.

Lets just say there's a school of thought that has 
no problem with a math, logic or grammar teacher 
using only Chinese characters for top level names 
in various exercises using Python or other 
Unicode aware computer language.  And no 
problem with another teacher using only Hebrew
characters for top level names and so on.

This school of though hangs out on the Python
Diversity list and self-organizes there.  If you go
back in the archives, you'll find myself and a 
guy named Carl doing stuff in the Python wiki
to expand the language base, including at the 
source code level.  With Pycon / Tehran in the
planning, we want to be in a better position to 
address issues relating GeoDjango to Farsi, say.

These exercises (mentioned above) may have 
nothing to do with writing commercial applications.  
These may not be programmers in training 
(though some may be in commercial media, 
where "programming" also has meaning (e.g. 
in radio / TV)).  Instead of using a calculator 
or abacus to learn numeracy skills, people 
have laptops and internet access.

Having readable source code in languages 
that aren't in a Roman alphabet is already 
a spreading phenomenon, with many writers 
happily giving up that so-called "world readability" 
in favor of remaining intelligible to the girl or boy 
next door.  

The syntax of URIs and domain names has 
already taken this turn.  You will have http//arabic letters// 
quite frequently these days, thanks to the 
Unicode basis of http (which Python now needs 
to deal with, and does, as an http-aware language).

CSS for Arabic is the kind of style concern for 
which we may have insufficient literature to date.
We may have people joining Diversity who want to
develop that literature (recruiting happening).

http://www.guardian.co.uk/technology/2010/may/06/arabic-web-addresses-internet

Here is sample output.  Notice that, when it blows up the traceback is in Python with English explanations:
<console dump>
procer numerus hic:III - II
I
procer numerus hic:3 - 2
I
procer numerus hic:3 - 3
Nulla
procer numerus hic:2 - 3
Traceback (most recent call last):
  File "CaesarCalc.py", line 40, in <module>
    print (cvt(subtrahends[0]) - cvt(subtrahends[1]))
  File "/home/vernon/romanclass-1.0.1/romanclass.py", line 99, in __sub__
    return Roman(self.__int__() - other)
  File "/home/vernon/romanclass-1.0.1/romanclass.py", line 85, in __new__
    raise OutOfRangeError, 'Cannot store "%s" as Roman' % repr(N)
romanclass.OutOfRangeError: Cannot store "-1" as Roman
</console dump>

IMHO, on the whole, PEP 8 is a pretty good document.
--
Vernon

I'm not denigrating PEP8 in any way, even though 
I used some light sarcasm in my post.  That was 
not directed against PEP8, so much as against 
the idea that the "rule book" is somehow complete, 
just because we have it down that functions should 
generally not start with a capital letter, and 
l (lowercase L) is a terrible name for all purposes 
because it's so indistinguishable from uppercase 
I and the number 1 in many fonts.

I think as people start getting a lot more experience 
writing Python with different namespaces, with 
non-Roman top-level names etc., that the rule 
book is inevitably going to expand and that a 
Book of Styles could conceivably become enormous. 

But then think of English:  we acknowledge many 
styles as being appropriate and don't have just 
the one "book" where style is concerned (we have 
so many) -- not like the dictionary, with a goal of 
including every word in a finite and deliberately 
exclusive set of standard words.

I have some examples of Python source in my 
blogs, using kanji as top-level names (might be 
a Japanese program, as one of the kanji is for 
Mt. Fuji as I recall).  

Then there's some tracking down Stallman on 
a visit to Sri Lanka (awhile back) and chatter 
about Python in Tamil and Sinhalese.  And yes, 
I am aware English is spoken in this parts as well,
as evidenced by Arthur C. Clarke's having lived
there for so long.  One of our CSN chiefs has a
track record there too, another English speaker.

http://www.sarvodaya.org/2005/05/17/suzanne-bader%E2%80%99s-sri-lanka-visit-report
http://controlroom.blogspot.com/2009/01/at-work.html
http://risenfall.wordpress.com/2008/01/14/richard-stallman-rms-is-in-sri-lanka/

Kirby