[Python-ideas] Proposal for default character representation

Mikhail V mikhailwas at gmail.com
Fri Oct 14 01:21:48 EDT 2016


Greg Ewing wrote:

> #define O_RDONLY        0x0000          /* open for reading only */
> #define O_WRONLY        0x0001          /* open for writing only */
> #define O_RDWR          0x0002          /* open for reading and writing */
> #define O_ACCMODE       0x0003          /* mask for above modes */

Good example. But it is not an average high level
code of course. Example works again only if
we for some reason follow binary segmentation
which is bound to low level functionality, in
this case 8 bit grouping.

> If you have occasion to write a literal representing a
> character code, there's nothing to stop you writing it
> in hex to match the way it's shown in a repr(), or in
> published Unicode tables, etc.

>>> c = "\u1235"
>>> if "\u1230" <= c <= "\u123f":

> I don't see a need for any conversions back and forth.

I'll explain what I mean with an example.
This is an example which I would make to
support my proposal. Compare:

if "\u1230" <= c <= "\u123f":

and:

o = ord (c)
if 100 <= o <= 150:

So yours is a valid code but for me its freaky,
and surely I stick to the second variant.
You said, I can better see in which unicode page
I am by looking at hex ordinal, but I hardly
need it, I just need to know one integer, namely
where some range begins, that's it.
Furthermore this is the code which would an average
programmer better read and maintain.
So it is the question of maintainability (+1).
Secondly, for me it is the question of being able to
type in and correct these decimal values: look,
if I make a mistake, typo, or want to expand the range
by some value I need to make summ and substract
operation in my head to progress with my code effectively.
Obviously nobody operates good with two notations
in head simultanosly, so I will complete my code
without extra effort.
Is it clear now what I mean by
conversions back and forth?
This example alone actually explains
my whole point very well,
I feel however like being misunderstood or so.

>> I am not against base-16 itself in the first place,
>> but rather against the character set which is simply visually
>> inconsistent and not readable.
>Now you're talking about inventing new characters, or
>at least new glyphs for existing ones, and persuading
>everyone to use them. That's well beyond the scope of
>what Python can achieve!

Yes ideally one uses other glyphs for base-16
it does not however mean that one must
use new invented glyphs. In standard ASCII
there are enough glyphs that would work way better
together, but it is too late anyway, should be better
decided at the time of standard declaration.
Got to love it.

> The meaning of 0xC001 is much clearer to me than
> 1100000000000001, because I'd have to count the bits very
> carefully in the latter to distinguish it from, e.g.
> 6001 or 18001.
> The bits could be spaced out:
> 1100 0000 0000 0001
> but that just takes up even more room to no good effect.
> I don't find it any faster to read -- if anything, it's

Greg, I feel somehow that you are an open minded person
and I value this. You also can understand
quite good how you read.
What you refer to here is the brevity of the word
Indeed there is some degrade of readability
if the word is too big, or a font is set to
big size, so you brake it, one step towards better.

And now I'll explain you some further magic
regarding the binary representation.
If you find free time you can experiment a bit.
So what is so peculiar about bitstring actually?
Bitstring unlike higher bases
can be treated as an abscence/presence
of the signal, which is not possible
for higher bases, literally binary string
can be made almost "analphabetic"
if one could say so.
Consider such notation:

instead of 1100 0000 0000 0001 you write

ұұ-ұ ---- ---- ---ұ

(NOTE: of course if you read this in non monospaced
font you will not see it correct, I should make screenshots
which I will do in a while)

Note that I choose this letter
not accidentally, this letter is similar
to one of glyphs with peak readability.
The unset value simply would be a stroke.
So I take only one letter.

ұұ-ұ ---- ---- ---ұ
---ұ ---ұ --ұ- -ұ--
--ұ- ---- ---- ---ұ
---- ---- --ұ- ---ұ
-ұұ- ұұ-- ---- ----
---- ---- ---- ----
--ұұ ---- ұ--- ---ұ
-ұ-- --ұұ ---- ---ұ

So the digits need not be equal-weighted as in
higher bases. What does it bring? Simple:
you can downscale the strings, so a 16-bit
value would be ~60 pixels wide (for 96dpi display)
without legibility loss, so it compensate
the "too wide to scan" issue.
And don't forget to make enough linespacing.

Other benefits of binary string obviously:
- nice editing feautures like bitshifting
- very interesting cognitive features,
(it becomes more noticable
however if you train to work with it)
...
So there is a whole bunch of good effects.
Understand me right, I don't have reason not to
believe you that you don't see any effect,
but you should always remember that this can be
simply caused by your habit. So if you are more
than 40 years old (sorry for some familiarity)
this can be really strong issue and unfortunately
hardly changeable. It is not bad, it is natural thing,
it is with everyone so.

> When I say "instantly", I really do mean *instantly*.
> I fail to see how a different glyph set could reduce
> the recognition time to less than zero.

It is not about speed, it is about brain load.
Chinese can read their hieroglyphs fast, but
the cognition load on the brain is 100 times higher
than current latin set.
I know people who can read bash scripts
fast, but would you claim that bash syntax can be
any good compared to Python syntax?

> Another point -- a string of hex digits is much easier
> for me to *remember*
Could be, I personally can remember numbers
in the above mentioned notation fotographically,
opposed to decimal, where I also tend to
speak it out to remember better, that is interesting,
more of psychology however.
Everyone is unique however in this sense.

Already noted, another good alternative for 8bit aligned
data will be quoternary notation, it is 2x more compact
and can be very legible due to few glyphs,
it is also possible to emulate it with existing chars.

Mikhail


More information about the Python-ideas mailing list