Multibyte Character Surport for Python

Thu May 9 08:16:48 EDT 2002

Stephen J. Turnbull wrote:
        ...
>     Alex> It matters not a whit *WHICH* language it is, it does matter
>     Alex> that it be ONE language, not hundreds and thousands.  In
>     Alex> practice that one language isn't Latin any more (in most
>     Alex> fields of endeavour) but English.  Fine, whatever.  As long
>     Alex> as it's ONE.
> 
> Agreed.  Except that a decade from now Chinese might be the ONE.  Then
> we'll be glad we have hanzi identifiers, as Python sweeps the CJK
> world.<0.9 wink>

Fine, WHEN that happens.  And IF, of course.  Meanwhile, Ruby will
probably get there first, born in Japan and all, right?  Hey, it IS
so close to Python it almost hurts.  And AFAIK, it doesn't support
what you so intensely want, which hasn't stopped it from huge Japan
success, though it may in the future.

>     Alex> and I earnestly hope Python does nothing at all to _help_ it.
> 
> ... is it really worth sacrificing the ability to introduce more
> non-programmers to programming to avoid "helping fragmentation" by
> 25% over what those who want localized identifiers already can do?

Yes.  And it's NOT worth (IMHO, of course) helping the Japanese keep
ever more insular and separated from the rest of the world, a serious
aspect of their current predicaments (not just _my_ opinion, which
would be worth little given I'm no expert in this -- have a look at
the Economist's survey of Japan, it came out a month or so ago and
it's surely still on their site, www.economist.com).

Like "code to be run just once and then thrown away", similarly "code
that will never see the outside of this room" WILL over and over again
survive and spread to the four corners of the Earth, surprising all
involved, starting with the code's creator.  Sure, it's bad enough if
said code has an identifier "principi" -- you don't know what it means
and must infer from context.  (The comments and docstrings if any are
likely just as obscure).  But it's STILL worse to let the code have
TWO identifiers "príncipi" and "princípi", where you have to notice
the subtle issue of which "i" has which kind of accent to be able to
tell the two identifiers apart.  English minimizes this issue by
having just 26 glyphs (52, sigh, when you distinguish upper and lower
case, one issue where I wish Python was different), and no accents nor
other hard-to-tell-apart diacritics -- confusion is still possible but
much less likely than in a language WITH diacritics or thousands of
different glyphs.  Non-programmers must learn ONE fundamental thing
about computer languages: that they're utterly different from natural
language, their nature, purpose and operation completely separate.

Whenever you try to blur the distinction, you do them all a signal
disservice.  They may THINK they want to "speak their own language to
the computer", but they _don't_, really: if they think so, it's because
they still haven't grasped the key differences.  Help them learn, rather
than "helping" them hide their ignorance from themselves.

Alex