
Paul Moore writes:
On 29 October 2016 at 18:19, Stephen J. Turnbull <turnbull.stephen.fw@u.tsukuba.ac.jp> wrote:
For better or worse, it may be emoji that drive that change ;-)
I suspect that the 100 million or so Chinese, Japanese, Korean, and Indian programmers who have had systems that have no trouble whatsoever handling non-ASCII for as long they've used computers will drive that change.
My apologies. You are of course absolutely right.
tl;dr: A quick apology for the snark, and an attempt at FUD reduction. Using non-ASCII characters will involve some cost, but there are real benefits, and the fear and loathing often evoked by the prospect is unnecessary. I'm not ready to advocate introduction *right* now, but "never" isn't acceptable either. :-) On with the show: "Absolutely" is more than I deserve, as I was being a bit snarky. That said, Ed Yourdon wrote a book in 1990 or so with the self-promoting title of "Decline and Fall of the American Programmer"[1] in which he argued that for many kinds of software outsourcing to China, India, or Ireland got you faster, better, cheaper, and internationalized, with no tradeoffs. (The "and internationalized" is my hobby horse, it wasn't part of Yourdon's thesis.) He later recanted the extremist doomsaying, but a quick review of the fraction of H1B visas granted to Asian-origin programmers should convince you that USA/EUR/ANZ doesn't have a monopoly of good-to-great programming (probably never did, but that's a topic for a different thread). Also note that in Japan, without controlling for other factors, just the programming language used most frequently, Python programmers are the highest paid among developers in all languages with more than 1% of the sample (and yes, that includes COBOL!) To the extent that internationalization matters to a particular kind of programming, these programmers are better placed for those jobs, I think. And while in many cases "on site" has a big advantage (so you can't telecommute from Bangalore, you need that H1B which is available in rather restrictive number), more and more outsourcing does cross oceans so potential competition is immense. There is a benefit to increasing our internationalization in backward- incompatible ways. And that benefit is increasing both in magnitude and in the number of Python developers who will receive it.
I'm curious to know how easy it is for Chinese, Japanese, Korean and Indian programmers to use *ASCII* characters. I have no idea in practice whether the current basically entirely-ASCII nature of programming languages is as much a problem for them
Characters are zero problem for them. The East Asian national standards all include the ASCII repertoire, and some device (usually based on ISO 2022 coding extensions rather than UTF-8) for allowing ASCII to be one-byte, even if the "local" characters require two or more bytes. I forget if India's original national standard also included an ASCII subset, but they switched over to Unicode quite early[2], so UTF-8 does the trick for them. English (the language) is a much bigger issue. Most Indians, of course, have little trouble with the derived-from- English nature of much programming syntax and library identifiers, and the Asians all get enough training in both (very) basic English and rote memorization that handling English-derived syntax and library nomenclature is not a problem. However, reading and especially creating documentation can be expensive and inaccurate. At least in Japanese, "straightforward" translations are often poor, as nuances are lost. E.g., a literal Japanese translation from English requires many words to indicate the differences a simple "a" vs. "the" vs. "some" indicates in English. Mostly such nuances can be expressed economically by restructuring a whole paragraph, but translators rarely bother and often seem unaware of the issues. Many Japanese programmers' use of articles is literally chaotic: it's deterministic but appears random to all but the most careful analysis.[3]
as I imagine Unicode characters would be for me. I really hope it isn't...
I think your imagination is running away with you. While I understand how costly it is for those over the age of 12 to develop new habits (I'm 58, and painfully aware of how frequently I balk at learning anything new no matter how productivity-enhancing it is likely to be, and how much more slowly it becomes part of my repertoire), the number of new things you would need to learn would be few, and frequently enough used, at least in Python. It's hard enough to get Guido (and the other Masters of Pythonic Language Design) to sign on to new ASCII syntax; even if in principle non-ASCII were to be admitted, I suspect the barrier there would be even higher. Most of Unicode is irrelevant to everybody. Mathematicians use only a small fraction of the math notation available to them -- it's just that it's a different small fraction for each field. The East Asians need a big chunk (I would guess that educated Chinese and Japanese encounter about 10,000 characters in "daily life" over a lifetime, while those encountered at least once a week number about 3000), but those that need to be memorized are a small minority (less than 5%) of the already defined Unicode repertoire. For Western programmers, the mechanics are almost certainly there. Every personal computer should have at least one font containing all characters defined in the Basic Multilingual Plane, and most will have chunks of the astral planes (emoji, rare math symbols, country flags, ...). Even the Happy Hacker keyboard has enough mode keys (shift, control, ...) to allow defining "3-finger salutes" for commonly-used characters not on the keycaps -- in daily life if you don't need a input method now, you won't need one if Python decides to use WHITE SQUARE to represent an operation you frequently use -- just an extra "control key combo" like the editing control keys (eg, for copy, cut, paste, undo) that aren't marked on any keyboard I have. I'm *not* advocating *imposing* the necessary effort on anyone right now. I just want to reduce the FUD associated with the prospect that it *might* be imposed on *you*, so that you can evaluate the benefits in light of the real costs. They're not zero, but they're unlikely to ruin your whole day, every day, for months.[4] "Although sometimes never is better than *right* now" doesn't apply here. :-) Footnotes: [1] India is a multiscript country, so faces the same pressure for a single, internationally accepted character set as the whole world does, albeit at a lower level. [2] "The American Programmer" was the name of Yourdon's consultancy's newsletter to managers of software projects and software development organizations. [3] Of course the opposite is true when I write Japanese. In particular, there's a syntactic component called "particle" (the closest English equivalent is "preposition", but particles have much more general roles) that I'm sure my usage is equally chaotic from the point of view of a native speaker of Japanese -- even after working in the language for 25 years! N.B. I'm good enough at the language to have written grant proposals that were accepted in it -- and still my usage of particles is unreliable. [4] Well, if your role involves teaching other programmers, their pushback could be a long-lasting irritant. :-(