On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano firstname.lastname@example.org wrote:
On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:
and in long perspective when the world's alphabetical garbage will dissapear, two digits would be ok.
Talking about "alphabetical garbage" like that makes you seem to be an ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even 7-bit ASCII has more than 100 characters (128).
Solution: Abolish most of the control characters. Let's define a brand new character encoding with no "alphabetical garbage". These characters will be sufficient for everyone:
*  Formatting characters: space, newline. Everything else can go. *  Digits: 01234567 *  Lower case Latin letters a-z *  Vital social media characters: # (now officially called "HASHTAG"), @ *  Can't-type-URLs-without-them: colon, slash (now called both "SLASH" and "BACKSLASH")
That's 40 characters that should cover all the important things anyone does - namely, Twitter, Facebook, and email. We don't need punctuation or capitalization, as they're dying arts and just make you look pretentious. I might have missed a few critical characters, but it should be possible to fit it all within 64, which you can then represent using two digits from our newly-restricted set; octal is better than decimal, as it needs less symbols. (Oh, sorry, so that's actually "50" characters, of which "32" are the letters. And we can use up to "100" and still fit within two digits.)
Is this the wrong approach, Mikhail? Perhaps we should go the other way, then, and be *inclusive* of people who speak other languages. Thanks to Unicode's rich collection of characters, we can represent multiple languages in a single document; see, for instance, how this uses four languages and three entirely distinct scripts: http://youtu.be/iydlR_ptLmk Turkish and French both use the Latin script, but have different characters. Alphabetical garbage, or accurate representations of sounds and words in those languages?
Python 3 gives the world's languages equal footing. This is a feature, not a bug. It has consequences, including that arbitrary character entities could involve up to seven decimal digits or six hex (although for most practical work, six decimal or five hex will suffice). Those consequences are a trivial price to pay for uniting the whole internet, as opposed to having pockets of different languages, like we had up until the 90s.