[Python-ideas] Proposal for default character representation
M.-A. Lemburg
mal at egenix.com
Wed Oct 12 17:48:15 EDT 2016
On 12.10.2016 23:33, Mikhail V wrote:
> Hello all,
>
> I want to share my thoughts about syntax improvements regarding
> character representation in Python.
> I am new to the list so if such a discussion or a PEP exists already,
> please let me know.
>
> So in short:
>
> Currently Python uses hexadecimal notation
> for characters for input and output.
> For example let's take a unicode string "абв.txt"
> (a file named with first three Cyrillic letters).
>
> Now printing it we get:
>
> u'\u0430\u0431\u0432.txt'
Hmm, in Python3, I get:
>>> s = "абв.txt"
>>> s
'абв.txt'
> So one sees that we have hex numbers here.
> Same is for typing in the strings which obviously also uses hex.
> Same is for some parts of the Python documentation,
> especially those about unicode strings.
>
> PROPOSAL:
> 1. Remove all hex notation from printing functions, typing,
> documention.
> So for printing functions leave the hex as an "option",
> for example for those who feel the need for hex representation,
> which is strange IMO.
> 2. Replace it with decimal notation, in this case e.g:
>
> u'\u0430\u0431\u0432.txt' becomes
> u'\u1072\u1073\u1074.txt'
>
> and similarly for other cases where raw bytes must be printed/inputed
> So to summarize: make the decimal notation standard for all cases.
> I am not going to go deeper, such as what digit amount (leading zeros)
> to use, since it's quite secondary decision.
>
> MOTIVATION:
> 1. Hex notation is hardly readable. It was not designed with readability
> in mind, so for reading it is not appropriate system, at least with the
> current character set, which is a mix of digits and letters (curious who
> was that wize person who invented such a set?).
> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
> I hope no need to explain why.
>
> So that's it, in short.
> Feel free to discuss and comment.
The hex notation for \uXXXX is a standard also used in many other
programming languages, it's also easier to parse, so I don't
think we should change this default.
Take e.g.
>>> s = "\u123456"
>>> s
'ሴ56'
With decimal notation, it's not clear where to end parsing
the digit notation.
--
Marc-Andre Lemburg
eGenix.com
Professional Python Services directly from the Experts (#1, Oct 12 2016)
>>> Python Projects, Coaching and Consulting ... http://www.egenix.com/
>>> Python Database Interfaces ... http://products.egenix.com/
>>> Plone/Zope Database Interfaces ... http://zope.egenix.com/
________________________________________________________________________
::: We implement business ideas - efficiently in both time and costs :::
eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
Registered at Amtsgericht Duesseldorf: HRB 46611
http://www.egenix.com/company/contact/
http://www.malemburg.com/
More information about the Python-ideas
mailing list