[Python-ideas] Proposal for default character representation
Mikhail V
mikhailwas at gmail.com
Thu Oct 13 01:46:38 EDT 2016
On 13 October 2016 at 04:18, Brendan Barnwell <brenbarn at brenbarn.net> wrote:
> On 2016-10-12 18:56, Mikhail V wrote:
>>
>> Please don't mix the readability and personal habit, which previuos
>> repliers seems to do as well. Those two things has nothing
>> to do with each other.
>
>
> You keep saying this, but it's quite incorrect. The usage of
> decimal notation is itself just a convention, and the only reason it's easy
> for you (and for many other people) is because you're used to it. If you
> had grown up using only hexadecimal or binary, you would find decimal
> awkward.
Exactly, but this is not called "readability" but rather
"acquired ability to read" or simply habit, which does not reflect
the "readability" of the character set itself.
> There is nothing objectively better about base 10 than any other
> place-value numbering system.
Sorry to say, but here you are totally wrong.
Not to treat you personally for your fallacy, that is quite common
among those who are not familiar with the topic, but you
should consider some important points:
---
1. Each taken character set has certain grade of readability
which depends solely on the form of its units (aka glyphs).
2. Linear string representation is superior to anything else (spiral, arc, etc.)
3. There exist glyphs which provide maximal readability,
those are particular glyphs with particular constant form, and
this form is absolutely independent from the encoding subject.
4. According to my personal studies (which does not mean
it must be accepted or blindly believed in, but I have solid experience
in this area and acting quite successful in it)
the amount of this glyphs is less then 10, namely I am by 8 glyphs now.
5. Main measured parameter which reflects the
readability (somewhat indirect however) is the pair-wize
optical collision of each character pair of the set.
This refers somewhat to legibility, or differentiation ability
of glyphs.
---
Less technically, you can understand it better if you think
of your own words
"There is nothing objectively better
about base 10 than any
other place-value numbering system."
If this could be ever true than you could read with characters that
are very similar to each other or something messy as good as
with characters which are easily identifyable, collision resistant
and optically consistent. But that is absurd, sorry.
For numbers obviously you don't need so many character as for
speech encoding, so this means that only those glyphs or even a subset
of it should be used. This means anything more than 8 characters
is quite worthless for reading numbers.
Note that I can't provide here the works currently
so don't ask me for that. Some of them would be probably
available in near future.
Your analogy with speech and signs is not correct because
speech is different but numbers are numbers.
But also for different speech, same character set must be used
namely the one with superior optical qualities, readability.
> Saying we should dump hex notation because everyone understands decimal is
> like saying that all signs in Prague should only be printed in English
We should dump hex notation because currently decimal
is simply superiour to hex, just like Mercedes is
superior to Lada, aand secondly, because it is more common
for ALL people, so it is 2:0 for not using such notation.
With that said, I am not against base-16 itself in the first place,
but rather against the character set which is simply visually
inconsistent and not readable.
Someone just took arabic digits and added
first latin letters to it. It could be forgiven for a schoolboy's
exercises in drawing but I fail to understand how it can be
accepted as a working notation for medium supposed
to be human readable.
Practically all this notation does, it reduces the time
before you as a programmer
become visual and brain impairments.
> Just look at the Wikipedia page for Unicode, which says: "Normally a
> Unicode code point is referred to by writing "U+" followed by its
> hexadecimal number." That's it.
Yeah that's it. And it sucks and migrated to coding
standard, sucks twice.
If a new syntax/standard is decided, there'll
be only positive sides of using decimal vs hex.
So nobody'll be hurt, this is only the question of
remaking current implementation and is proposed
only as a long-term theoretical improvement.
> it's just
> a label that identifies the character.
Ok, but if I write a string filtering in Python for example then
obviously I use decimal everywhere to compare index ranges, etc.
so what is the use for me of that label? Just redundant
conversions back and forth. Makes me sick actually.
More information about the Python-ideas
mailing list