On 12.10.2016 23:33, Mikhail V wrote:
I want to share my thoughts about syntax improvements regarding character representation in Python. I am new to the list so if such a discussion or a PEP exists already, please let me know.
So in short:
Currently Python uses hexadecimal notation for characters for input and output. For example let's take a unicode string "абв.txt" (a file named with first three Cyrillic letters).
Now printing it we get:
Hmm, in Python3, I get:
s = "абв.txt" s
So one sees that we have hex numbers here. Same is for typing in the strings which obviously also uses hex. Same is for some parts of the Python documentation, especially those about unicode strings.
- Remove all hex notation from printing functions, typing,
documention. So for printing functions leave the hex as an "option", for example for those who feel the need for hex representation, which is strange IMO. 2. Replace it with decimal notation, in this case e.g:
u'\u0430\u0431\u0432.txt' becomes u'\u1072\u1073\u1074.txt'
and similarly for other cases where raw bytes must be printed/inputed So to summarize: make the decimal notation standard for all cases. I am not going to go deeper, such as what digit amount (leading zeros) to use, since it's quite secondary decision.
- Hex notation is hardly readable. It was not designed with readability
in mind, so for reading it is not appropriate system, at least with the current character set, which is a mix of digits and letters (curious who was that wize person who invented such a set?). 2. Mixing of two notations (hex and decimal) is a _very_ bad idea, I hope no need to explain why.
So that's it, in short. Feel free to discuss and comment.
The hex notation for \uXXXX is a standard also used in many other programming languages, it's also easier to parse, so I don't think we should change this default.
s = "\u123456" s
With decimal notation, it's not clear where to end parsing the digit notation.