[Python-ideas] Proposal for default character representation

M.-A. Lemburg mal at egenix.com
Wed Oct 12 17:48:15 EDT 2016


On 12.10.2016 23:33, Mikhail V wrote:
> Hello all,
> 
> I want to share my thoughts about syntax improvements regarding
> character representation in Python.
> I am new to the list so if such a discussion or a PEP exists already,
> please let me know.
> 
> So in short:
> 
> Currently Python uses hexadecimal notation
> for characters for input and output.
> For example let's take a unicode string "абв.txt"
> (a file named with first three Cyrillic letters).
> 
> Now printing  it we get:
> 
> u'\u0430\u0431\u0432.txt'

Hmm, in Python3, I get:

>>> s = "абв.txt"
>>> s
'абв.txt'

> So one sees that we have hex numbers here.
> Same is for typing in the strings which obviously also uses hex.
> Same is for some parts of the Python documentation,
> especially those about unicode strings.
> 
> PROPOSAL:
> 1. Remove all hex notation from printing functions, typing,
> documention.
> So for printing functions leave the hex as an "option",
> for example for those who feel the need for hex representation,
> which is strange IMO.
> 2. Replace it with decimal notation, in this case e.g:
> 
> u'\u0430\u0431\u0432.txt' becomes
> u'\u1072\u1073\u1074.txt'
> 
> and similarly for other cases where raw bytes must be printed/inputed
> So to summarize: make the decimal notation standard for all cases.
> I am not going to go deeper, such as what digit amount (leading zeros)
> to use, since it's quite secondary decision.
> 
> MOTIVATION:
> 1. Hex notation is hardly readable. It was not designed with readability
> in mind, so for reading it is not appropriate system, at least with the
> current character set, which is a mix of digits and letters (curious who
> was that wize person who invented such a set?).
> 2. Mixing of two notations (hex and decimal) is a _very_ bad idea,
> I hope no need to explain why.
> 
> So that's it, in short.
> Feel free to discuss and comment.

The hex notation for \uXXXX is a standard also used in many other
programming languages, it's also easier to parse, so I don't
think we should change this default.

Take e.g.

>>> s = "\u123456"
>>> s
'ሴ56'

With decimal notation, it's not clear where to end parsing
the digit notation.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts (#1, Oct 12 2016)
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...           http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...           http://zope.egenix.com/
________________________________________________________________________

::: We implement business ideas - efficiently in both time and costs :::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
    D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
           Registered at Amtsgericht Duesseldorf: HRB 46611
               http://www.egenix.com/company/contact/
                      http://www.malemburg.com/



More information about the Python-ideas mailing list