[Python-ideas] Proposal for default character representation

Fri Oct 14 03:53:07 EDT 2016

On 13 October 2016 at 16:50, Chris Angelico <rosuav at gmail.com> wrote:
> On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano <steve at pearwood.info> wrote:
>> On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:
>>> and in long perspective when the world's alphabetical garbage will
>>> dissapear, two digits would be ok.
>> Talking about "alphabetical garbage" like that makes you seem to be an
>> ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even
>> 7-bit ASCII has more than 100 characters (128).

This is sort of rude. Are you from unicode consortium?

> Solution: Abolish most of the control characters. Let's define a brand
> new character encoding with no "alphabetical garbage". These
> characters will be sufficient for everyone:
>
> * [2] Formatting characters: space, newline. Everything else can go.
> * [8] Digits: 01234567
> * [26] Lower case Latin letters a-z
> * [2] Vital social media characters: # (now officially called "HASHTAG"), @
> * [2] Can't-type-URLs-without-them: colon, slash (now called both
> "SLASH" and "BACKSLASH")
>
> That's 40 characters that should cover all the important things anyone
> does - namely, Twitter, Facebook, and email. We don't need punctuation
> or capitalization, as they're dying arts and just make you look
> pretentious. I might have missed a few critical characters, but it
> should be possible to fit it all within 64, which you can then
> represent using two digits from our newly-restricted set; octal is
> better than decimal, as it needs less symbols. (Oh, sorry, so that's
> actually "50" characters, of which "32" are the letters. And we can
> use up to "100" and still fit within two digits.)
>
> Is this the wrong approach, Mikhail?

This is sort of correct approach. We do need punctuation however.
And one does not need of course to make it too tight.
So 8-bit units for text is excellent and enough space left for experiments.

> Perhaps we should go the other
> way, then, and be *inclusive* of people who speak other languages.

What keeps people from using same characters?
I will tell you what - it is local law. If you go to school you *have* to
write in what is prescribed by big daddy. If youre in europe or America, you are
more lucky. And if you're in China you'll be punished if you
want some freedom. So like it or not, learn hieroglyphs
and become visually impaired in age of 18.

> Thanks to Unicode's rich collection of characters, we can represent
> multiple languages in a single document;

Can do it without unicode in 8-bit boundaries with tagged text,
just need fonts for your language, of course if your
local charset is less than 256 letters.

This is how it was before unicode I suppose.
BTW I don't get it still what such revolutionary
advantages has unicode compared to tagged text.

> script, but have different characters. Alphabetical garbage, or
> accurate representations of sounds and words in those languages?

Accurate with some 50 characters is more than enough.

Mikhail