On Fri, Oct 14, 2016 at 6:53 PM, Mikhail V <mikhailwas@gmail.com> wrote:
On 13 October 2016 at 16:50, Chris Angelico <rosuav@gmail.com> wrote:
On Fri, Oct 14, 2016 at 1:25 AM, Steven D'Aprano <steve@pearwood.info> wrote:
On Thu, Oct 13, 2016 at 03:56:59AM +0200, Mikhail V wrote:
and in long perspective when the world's alphabetical garbage will dissapear, two digits would be ok. Talking about "alphabetical garbage" like that makes you seem to be an ASCII bigot: rude, ignorant, arrogant and rather foolish as well. Even 7-bit ASCII has more than 100 characters (128).
This is sort of rude. Are you from unicode consortium?
No, he's not. He just knows a thing or two.
Solution: Abolish most of the control characters. Let's define a brand new character encoding with no "alphabetical garbage". These characters will be sufficient for everyone:
* [2] Formatting characters: space, newline. Everything else can go. * [8] Digits: 01234567 * [26] Lower case Latin letters a-z * [2] Vital social media characters: # (now officially called "HASHTAG"), @ * [2] Can't-type-URLs-without-them: colon, slash (now called both "SLASH" and "BACKSLASH")
That's 40 characters that should cover all the important things anyone does - namely, Twitter, Facebook, and email. We don't need punctuation or capitalization, as they're dying arts and just make you look pretentious. I might have missed a few critical characters, but it should be possible to fit it all within 64, which you can then represent using two digits from our newly-restricted set; octal is better than decimal, as it needs less symbols. (Oh, sorry, so that's actually "50" characters, of which "32" are the letters. And we can use up to "100" and still fit within two digits.)
Is this the wrong approach, Mikhail?
This is sort of correct approach. We do need punctuation however. And one does not need of course to make it too tight. So 8-bit units for text is excellent and enough space left for experiments.
... okay. I'm done arguing. Go do some translation work some time. Here, have a read of some stuff I've written before. http://rosuav.blogspot.com/2016/09/case-sensitivity-matters.html http://rosuav.blogspot.com/2015/03/file-systems-case-insensitivity-is.html http://rosuav.blogspot.com/2014/12/unicode-makes-life-easy.html
Perhaps we should go the other way, then, and be *inclusive* of people who speak other languages.
What keeps people from using same characters? I will tell you what - it is local law. If you go to school you *have* to write in what is prescribed by big daddy. If youre in europe or America, you are more lucky. And if you're in China you'll be punished if you want some freedom. So like it or not, learn hieroglyphs and become visually impaired in age of 18.
Never mind about China and its political problems. All you need to do is move around Europe for a bit and see how there are more sounds than can be usefully represented. Turkish has a simple system wherein the written and spoken forms have direct correspondence, which means they need to distinguish eight fundamental vowels. How are you going to spell those? Scandinavian languages make use of letters like "å" (called "A with ring" in English, but identified by its sound in Norwegian, same as our letters are - pronounced "Aww" or "Or" or "Au" or thereabouts). To adequately represent both Turkish and Norwegian in the same document, you *need* more letters than our 26.
Thanks to Unicode's rich collection of characters, we can represent multiple languages in a single document;
Can do it without unicode in 8-bit boundaries with tagged text, just need fonts for your language, of course if your local charset is less than 256 letters.
No, you can't. Also, you shouldn't. It makes virtually every text operation impossible: you can't split and rejoin text without tracking the encodings. Go try to write a text editor under your scheme and see how hard it is.
This is how it was before unicode I suppose. BTW I don't get it still what such revolutionary advantages has unicode compared to tagged text.
It's not tagged. That's the huge advantage.
script, but have different characters. Alphabetical garbage, or accurate representations of sounds and words in those languages?
Accurate with some 50 characters is more than enough.
Go build a chat room or something. Invite people to enter their names. Now make sure you're courteous enough to display those names to people. Try doing that without Unicode. I'm done. None of this belongs on python-ideas - it's getting pretty off-topic even for python-list, and you're talking about modifying Python 2.7 which is a total non-starter anyway. ChrisA