[Python-ideas] Input characters in strings by decimals (Was: Proposal for default character representation)

Emanuel Barry vgr255 at live.ca
Thu Dec 8 11:32:02 EST 2016


> From: Mikhail V
> Sent: Thursday, December 08, 2016 11:07 AM
> Subject: Re: [Python-ideas] Input characters in strings by decimals (Was:
> Proposal for default character representation)
> No I don't need to specify "unicode table *decimal*".
> 
> Results for "unicode table" in google:
> 
> Top Result # 2:
> www.utf8-chartable.de/
> 
> Top Result # 4:
> http://www.tamasoft.co.jp/en/general-info/index.html

Except that both of these websites show you hexadecimal notation.

> And I hope it is clear why most people stick to hex (I never argued that
BTW),
> but it is mostly historical, nothing to do with "logical".

That's not true. Characters are sorted by ranges. For example, I know that
everything below 0x20 is control code, uppercase ASCII letters start at 0x41
(0x40 is '@') and lowercase ASCII letters start at 0x61 (where 0x60 is '`')
- trivial to remember. I also know that ASCII goes as high as half a byte,
or 0x7f (half of 0x100). For instance, the first letter of my name is 0xc9,
and anyone can know, at a glance and without knowing my name or what the
letter is, that it's not ASCII.

Also, as far as I know, lowercase letters (ASCII or not) begin some multiple
of 0x10 after the beginning of the uppercase letters (0x20 for ASCII or
latin-1). As such, since I know that 'É' is 0xc9, I can know, without even
looking, that 0xe9 is 'é'. That would be a lot trickier in decimal to
remember and get right. As an aside, and I don't know this by heart, various
sets of characters begin at fixed points, and knowing those points (when you
need to work with specific sets of characters) can be very useful.

If you look at a website (https://unicode-table.com/ seems good), you can
even select ranges of characters, which conveniently end up being multiples
of 0x10 (or 16 in decimal). If your point is "it's easier to work with
numbers ending with 0", then you'll be pleased to know that character sets
are actually designed so that, using hexadecimal notation, you're dealing
with numbers ending with 0! Doing this using decimal notation is clunky at
best.

Yours,
\xc9manuel


More information about the Python-ideas mailing list