[Python-ideas] Non-ASCII in Python syntax? [was: Null coalescing operator]

Stephen J. Turnbull turnbull.stephen.fw at u.tsukuba.ac.jp
Sun Oct 30 10:51:18 EDT 2016


Paul Moore writes:

 > My point wasn't so much about dealing with the character set of
 > Unicode, as it was about physical entry of non-native text. For
 > example, on my (UK) keyboard, all of the printed keycaps are basically
 > used.

How do you type the pound sign and the Euro sign?  Are they on the UK
keyboard?  Or are you not in the UK and don't need them?

 > And yet, I can't even enter accented letters from latin-1 with a
 > standard keypress, much less extended Unicode.

I'm pretty sure you can, but since I've been Windows-free for 20 years
(except for a short period when I was treasurer for an NPO, and only
used it to access the accounting system), I can't tell you what it is.
On the Mac, you press alt/option plus a graphic key.  Most result in
what somebody decided are common non-ASCII characters (German sharp S,
Greek lowercase mu, Greek upper- and lowercase sigma), but several are
dead keys, producing accented characters when combined with a base
character: tilde, accents acute and grave, and so on.  Surely Windows
has a similar system (I don't mean Alt+digits).  (But maybe not, I
didn't notice one in my brief Googling.)

 > My interest in East Asian experience is at least in part because
 > the "normal" character sets, as I understand it, are big enough
 > that it's impractical for a keyboard to include a plausible basic
 > range of characters, so I'm curious as to what the physical process
 > is for typing from a vocabulary of thousands of characters on a
 > sanely-sized keyboard.

You're right about the size.  Korean is special, because the 11,000-
odd Hangul are phonetic and generated algorithmically from a set of
about 70 phonetic partial glyphs, divided into three groups.  The same
keys do multiple duty when typed in phonetic order.  Other systems use
the shift key.

For the 100,000 Han ideographs[1], there are a wide variety of methods
for entry by key sequence, ranging from code point entry to
context-dependent phonetic entry of entire sentences as they would be
spoken.  Then, of course, there's voice recognition, and handwriting
recognition (both static from the image, and dynamic, taking account
of the order of pen strokes).

The more advanced input methods not only take account of grammar, but
also learn the users' habits, remember recent conversions, and predict
coming keystrokes based on current context, offering several
conversions based on plausible continuations.

 > In mentioning emoji, my main point was that "average computer
 > users" are more and more likely to want to use emoji in general
 > applications (emails, web applications, even documents) - and if a
 > sufficiently general solution for that problem is found, it may
 > provide a solution for the general character-entry case.

Not for the Asian languages.  For them, "character entry" in the sense
of character-by-character has long since been obsoleted by predictive
sentence-level phonetic methods.

But emoji are a perfect example for the present purpose, since they
don't have standard pronunciations (although probably many will get
them based on the Unicode standard names).  On systems with high-
enough resolution displays, a palette showing the glyphs is the
obvious solution.  But that's not pleasant if you type quickly and
need those characters frequently.  I don't think there's an
alternative for emoji though, except for personalized shortcut maps.
Math symbols are similar, I think.

 > Coming back to a more mundane example, if I need to type a character
 > like é in an email, I currently need to reach for Character Map and
 > cut and paste it. The same is true if I have to type it into the
 > console.

You probably have Control, Windows, Menu, Alt, and maybe a "function"
key.  If you're lucky, one labelled AltGr for "Alternate Graphic" is
the obvious suspect.  Some combination of the above probably allows
entry of accented Latin-1 characters, miscellaneous Latin-1 (eg, sharp
S), and a few oddballs (Greek letters, ligatures like oe, the
leminiscate usually read infinity).

 > That's a sufficiently annoying stumbling block

It very well could be, although my Windows Google-foo isn't great.
But this is what I found.

For WHITE SQUARE, the Mac doesn't have a keyboard equivalent, but
there's a standard way to set up a set of shortcut keys[2]:
http://stackoverflow.com/questions/3685146/how-do-you-do-the-therefore-%E2%88%B4-symbol-on-a-mac-or-in-textmate
And I think you can also use the "Input Preferences" screen in System
Preferences to set up a few of them.

For Windows, it seems that Alt+decimal character codes, or hex Unicode
followed by Alt+x are the built-in ways to enter characters not on
your keyboard.  It's also possible to set up "Math Autocorrect" to
automatically convert keysequences according to
https://blogs.msdn.microsoft.com/murrays/2011/08/29/sans-serif-mathematical-symbols/
but that's hardly obvious (although maybe it is if you're Dutch?)

I have to wonder why so many people stick with a system that seems to
hate its users. :-(

Footnotes: 
[1]  I'm counting several thousand Taiwanese standard glyphs whose
pronunciation and meaning is no longer known (they're culled from old
manuscripts), as well as each of the 2 or 3 variants of several
thousand characters given simplified glyphs by the Japanese and PRC
standard bodies, because all have separate Unicode codepoints assigned. 

[2]  Note: I had to Google this because I use Japanese input methods:
when I want a square I type the Japanese word for "square" and then
press "next conversion" until the square I want shows up.  This also
works for most Greek letters and math symbols.  This doesn't bother
me, because it's normal for typing Japanese (and I do mix Japanese and
English enough that I know that it doesn't bug me when I need such a
character in an otherwise all-English text).  I suspect it would be
inadequate for someone who doesn't also type a language requiring a
complex input method.





More information about the Python-ideas mailing list