[Python-ideas] Proposal for default character representation

Wed Oct 12 22:44:25 EDT 2016

On 2016-10-13 00:50, Chris Angelico wrote:
> On Thu, Oct 13, 2016 at 10:09 AM, Mikhail V <mikhailwas at gmail.com> wrote:
>> On 12 October 2016 at 23:58, Danilo J. S. Bellini
>> <danilo.bellini at gmail.com> wrote:
>>
>>> Decimal notation is hardly
>>> readable when we're dealing with stuff designed in base 2 (e.g. due to the
>>> visual separation of distinct bytes).
>>
>> Hmm what keeps you from separateting the logical units to be represented each
>> by a decimal number? like 001 023 255 ...
>> Do you really think this is less readable than its hex equivalent?
>> Then you are probably working with hex numbers only, but I doubt that.
>
> Way WAY less readable, and I'm comfortable working in both hex and decimal.
>
>>> I agree that mixing representations for the same abstraction (using decimal
>>> in some places, hexadecimal in other ones) can be a bad idea.
>> "Can be"? It is indeed a horrible idea. Also not only for same abstraction
>> but at all.
>>
>>> makes me believe "decimal unicode codepoint" shouldn't ever appear in string
>>> representations.
>> I use this site to look the chars up:
>> http://www.tamasoft.co.jp/en/general-info/unicode-decimal.html
>
> You're the one who's non-standard here. Most of the world uses hex for
> Unicode codepoints.
>
> http://unicode.org/charts/
>
> HTML entities permit either decimal or hex, but other than that, I
> can't think of any common system that uses decimal for Unicode
> codepoints in strings.
>
>> PS:
>> that is rather peculiar, three negative replies already but with no strong
>> arguments why it would be bad to stick to decimal only, only some
>> "others do it so" and "tradition" arguments.
>
> "Others do it so" is actually a very strong argument. If all the rest
> of the world uses + to mean addition, and Python used + to mean
> subtraction, it doesn't matter how logical that is, it is *wrong*.
> Most of the world uses U+201C or "\u201C" to represent a curly double
> quote; if you us 0x93, you are annoyingly wrong, and if you use 8220,
> everyone has to do the conversion from that to 201C. Yes, these are
> all differently-valid standards, but that doesn't make it any less
> annoying.
>
>> Please note, I am talking only about readability _of the character
>> set_ actually.
>> And it is not including your habit issues, but rather is an objective
>> criteria for using this or that character set.
>> And decimal is objectively way more readable than hex standard character set,
>> regardless of  how strong your habits are.
>
> How many decimal digits would you use to denote a single character? Do
> you have to pad everything to seven digits (\u0000034 for an ASCII
> quote)? And if not, how do you mark the end? This is not "objectively
> more readable" if the only gain is "no A-F" and the loss is
> "unpredictable length".
>
Well, Perl doesn't have \u or \U; instead it has extended \x, so you can 
write, say, \x{201C}.

Still in hex, though, as nature intended! :-)